Welcome to Prism‘s Documentation!

Prism is a technology for building platform-agnostic workload analysis tools. Tools are built once and are able to run across multiple architectures and environments. Prism targets complex analyses that are latency-tolerant, in contrast to real-time analyses.

Prism aims to improve three main components of designing new analysis tools for research: 1) modularity, 2) design flexibility, and 3) productivity.

Overview

Prism is a framework designed to help analyze dynamic behavior in applications. This dynamic behavior, or workload, is a result of the application and its given inputs and state.

Workloads

One of the main goals behind Prism is providing a straightforward interface to intuitively represent and analyze workloads. A workload can be represented in many ways. Each way has different requirements.

For example, you can represent a workload as a simple assembly instruction trace...:

push   %rbp
push   %rbx
mov    %rsi,%rbp
mov    %edi,%ebx
sub    $0x8,%rsp
callq  4377b0 <_Z17myfuncv>
callq  4261e0 <_ZN5myotherfunc>
mov    %rbp,%rdx
mov    %ebx,%esi
mov    %rax,%rdi
callq  422460 <_ZN5GO>
add    $0x8,%rsp
xor    %eax,%eax
pop    %rbx
pop    %rbp
retq

...or a call graph...:

_images/callgraph_simple.svg

...or a memory trace...:

ADDR         BYTES
0xdeadbeef   8
0x12345678   4
0x00000000   1
...

...or more complex representations. Fundamentally, all workload representations can be broken down into five event primitives.

Event Primitives

Because of the variety of use-cases being supported, Prism presents workloads as a set of extensible primitives.

Event Primitive Description
Compute some transformation of data
Memory some movement of data
Control Flow divergence in an event stream
Synchronization ordering between separate event streams
Context grouping of events

E.g., an abstract workload is represented as:

...
compute     FLOP,   add,   SIMD4
memory      write,  4B,    <addr1>
memory      read,   16B,   <addr2>
context     func,   enter, hello_world_thread
sync        create, <TID1>
...

Event Generation

Many tools exist to capture workloads. Currently, Valgrind is well supported. DynamoRIO is on its way to good support, and we are experimenting with traces captured with hardware features.

Eventually, we aim to support a broad spectrum of tools to support many applications and hardware architectures, e.g.:

Each framework has its merits depending on the desired granularity and source of the event trace. Most binary instrumentation frameworks do a good job of obvserving the instruction stream of general purpose CPU workloads, but incur large overheads and may perturb results. Hardware support is good for real-time capture, but may have trouble capturing a native sized workload. Execution-driven simulators are great for fine-grained, low-level traces, but simulation time may be intractable for very large workloads, and simulators obviously must support the application. Additional capture methodologies exist for applications designed in interpreted or managed languages.

Prism recognizes these trade-offs and creates an abstraction to the underlying framework that observes the workload. Events are translated into Prism event primitives, which are then presented to the user for further analysis or simple trace-generation. The component used in a given framework for event generation is a Prism frontend, and the user-defined analysis or trace-generation on those events is a Prism backend. Currently, backends are written as C++ static plugins to Prism. We are interested in expanding support to C++ dynamic libraries and additionally python bindings.

Getting Started

Congrats on getting this far! 🎉

This portion of the documentation will walk you through setting up Prism and creating your first tool. Onwards! 🚀

Quickstart

This page will quickly walk you through building and running Prism.

Building Prism

Note

The default compiler for CentOS 7 and older (gcc <5) does not support C++14. Install and enable the offical Devtoolset before compiling.

Clone and build Prism from source:

$ git clone https://github.com/vandal/prism
$ cd prism
$ mkdir build && cd build
$ cmake3 .. # CentOS 7 requires cmake3 package
$ make -j

This creates a build/bin folder containing the prism executable. It can be run in place, or the entire bin folder can be moved, although it’s not advised to move it to a system location.

Running Prism

Prism requires at least two arguments: the backend analysis tool, and the executable application to measure:

$ bin/prism --backend=stgen --executable=./mybinary

The backend is the analysis tool that will analyze the requested events in mybinary. In this example, stgen is the backend that processes events into a special event trace that is used in SynchroTrace.

A third option frontend will change the underlying method for observing the application. By default, this is Valgrind:

$ bin/prism --frontend=valgrind --backend=stgen --executable=./mybinary

Dependencies

PACKAGE VERSION
gcc/g++ 5+
cmake 3.1.3+
make 3.8+
automake 1.13+
autoconf 2.69+
zlib 1.27+
git 1.8+

Building your first Prism tool

This example will demonstrate how to get started analyzing a workload. We’ll generate a simple tool that counts the number of memory events in a workload.

Writing Your Tool

First, let’s make a new folder for our backend, called EventCounter, and begin making the backend.

$ cd prism
$ mkdir src/Backends/EventCounter
$ touch src/Backends/EventCounter/EventCounter.hpp

Currently, all backends are created in C++, and inherit from a BackendIface class.

// EventCounter.hpp

#include "Core/Backends.hpp"

class EventHandler : public BackendIface { };

By default, each event is ignored. Let’s override this behavior and keep count of how many memory events pass.

// EventCounter.hpp

#include "Core/Backends.hpp"

class EventHandler : public BackendIface
{
    virtual void override onMemEv(const sigil2::MemEvent &ev) {
        memory_total++;
    }

    unsigned memory_total{0};
};

We keep track of the total memory count in a private class variable, memory_total. If multiple event streams are enabled, a new class instance is created for each stream.

This means we won’t be totalling events from the entire workload! We’ll use a naive approach is to use an atomic variable that all EventCounter instances can access.

// EventCounter.hpp

#include "Core/Backends.hpp"
#include <atomic>

extern std::atomic<unsigned> global_memory_total;

class EventHandler : public BackendIface
{
    virtual void override onMemEv(const sigil2::MemEvent &ev) {
        global_memory_total++;
    }
};

Now let’s optimize our EventHandler to only update our atomic global once at the end when the destructor is called, instead of at every memory event. We’ll also include the two extra functions:

  1. an event requirements function, to let Prism know to generate memory events
  2. a cleanup function, that executes after all event generation and event analysis has been performed.
$ touch src/Backends/EventCounter/EventCounter.cpp
// EventCounter.hpp

#ifndef EVENTCOUNTER_H
#define EVENTCOUNTER_H

#include "Core/Backends.hpp"
#include <atomic>

// forward function declarations
void cleanup(void);
sigil2::capabilities requirements(void);

// global memory event counter
extern std::atomic<unsigned> global_memory_total;

class EventHandler : public BackendIface
{
    ~EventHandler() {
        global_memory_total += memory_total;
    }

    virtual void override onMemEv(const sigil2::MemEvent &ev) {
        memory_total++;
    }

    unsigned memory_total{0};
};

#endif
// EventCounter.cpp

#include "EventCounter.hpp"
#include <iostream>

std::atomic<unsigned> global_memory_total{0};

// Event Request
sigil2::capabilities requirements()
{
    using namespace sigil2;
    using namespace sigil2::capability;

    auto caps = initCaps();

    caps[MEMORY] = availability::enabled;

    return caps;
}

// Final Clean up call
void cleanup()
{
    std::cout << "Total Memory Events: " << global_memory_total << std::endl;
}

Registering Your Tool

Let’s setup our new tool in Prism. Prism uses static plugins at the moment. This requires altering a bit of Prism source code, but is easier to maintain as a small project.

$ cd src/Core
$ $EDITOR main.cpp
// main.cpp

int main(int argc, char* argv[])
{
    auto config = Config()
        .registerFrontend(/* ... */)
        // register more frontends
        .registerBackend(/* ... */)
        // register more backends
        .parseCommandLine(argc, argv);
    return startPrism(config);
}

We can see all enabled backends and frontends here in one spot. This is clear and efficient when working with a smaller number of tools. Let’s register our backend.

// main.cpp

int main(int argc, char* argv[])
{
    auto config = Config()
        .registerFrontend(/* ... */)
        // register more frontends
        .registerBackend(/* ... */)
        // register more backends
        .registerBackend("EventCounter",
                         {[]{return std::make_unique<::EventHandler>();},
                          {},
                          ::cleanup,
                          ::requirements})
        .parseCommandLine(argc, argv);
    return startPrism(config);
}

The registerBackend member function takes 5 arguments:

  1. The name of the tool—this is used in the command line option.
  2. A function that returns a new instance of our event handler—we’ll use an anonymous function.
  3. A function to take any extra command line options—we aren’t using this so it’ll stay blank.
  4. An end function that is called after all events have been passed to the tool.
  5. A function that returns a set of events required by the Prism tool.

Now let’s make sure the build system knows about our tool. We need to add our tool as a static library to Prism.

$ cd src/Backend/EventCounter
$ cat > CMakeLists.txt <<EOF
> set(TOOLNAME EventCounter)
> set(SOURCES EventCounter.cpp)
>
> add_library(${TOOLNAME} STATIC ${SOURCES})
> set(PRISM_TOOLS_LIBS ${TOOLNAME} PARENT_SCOPE)
> EOF

And now we recompile Prism:

$ cd build
$ cmake ..
$ make -j

Running Your Tool

The new tool can be invoked as:

$ cd build
$ bin/prism --backend=EventCounter --executable=ls

The Profiling Frontend

A frontend is the component that is generating the event stream. By default, this is Valgrind (mostly due to historical reasons).

While it’s tempting to assume that the event generation just works™ you should be aware of the intrinsic nature of the chosen frontend before making any large assumptions.

Valgrind

Valgrind is the default frontend. No additional options are required. The following two command lines are equivalent.

$ bin/sigil2 --backend=simplecount --executable=ls -lah
$ bin/sigil2 --frontend=valgrind --backend=simplecount --executable=ls -lah

Valgrind is a copy & annotate dynamic binary instrumentation tool. This means that the dynamic instruction stream is grouped into blocks, disassembled into Valgrind’s VEX IR, instrumented, and then recompiled just-in-time.

DynamoRIO

DynamoRIO is not built with Prism by default. To enable DynamoRIO as a frontend, build Prism using the following cmake build command:

$ cmake .. -DCMAKE_BUILD_TYPE=release -DENABLE_DRSIGIL:bool=true

DynamoRIO can now be invoked as a frontend:

$ bin/sigil2 --frontend=dynamorio --backend=simplecount --executable=ls -lah

DynamoRIO’s IR exists closer to the ISA than the IR used by Valgrind. Prism converts DynamoRIO IR to event primitives by inspection of each opcode.

Todo

mmm475 to fill in more details

Future

Additional frontends being explored include:

  • LLVM-tracer
  • Contech
  • GPU Ocelot

Events Documentation

Events List

Five event primitives:

  1. memory
  2. compute
  3. synchronization
  4. context
  5. control flow

Memory

Attribute Details
Type
none
read
write
Address numeric
Size (Bytes) numeric

Compute

Attribute Details
Type
Integer Operation (IOP)
Floating Point Operation (FLOP)
Arity numeric
Size numeric
Cost Operation
add
sub
mult
div
shift
mov

Synchronization

Attribute Details
Type
none
spawn
join
barrier
sync
swap
lock
unlock
conditional wait
conditional signal
conditional broadcast
spin lock
spin unlock
data1 numeric
data2 numeric

Todo

data1/2 is currently a hack for SynchroTraceGen. Eventually we want to have the amount of data change depending on Type. Each datum is not necessarily used, depending on the Type. Ideally the amount of data tupled in the event will depend on its Type, but it’s faster to iterate over when there’s a definitive size.

Context

Attribute Details
Type
none
instruction
basic Block
function Enter
function Exit
thread
id
name (function)
numeric
string

Todo

Currently threads are delimited in the event stream with a Sync-Swap event. This should eventually move to a Cxt-Thread event, since the event does not strictly order the threads, and is intended to just group events that follow it.

Control Flow

Note

Control Flow is currently not implemented. This table is intended as a guide for future support.

Attribute Details
Type
jump
call
return
suspend
Conditional
true
false
condition
Destination Type
instruction
other
Destination numeric

Notes

Backend Documentation

SimpleCount

Synopsis

$ bin/sigil2 --frontend=FRONTEND --backend=simplecount --executable=mybinary -myoptions

Description

SimpleCount is a demonstrative backend that counts each event type received from a given frontend. These events are aggregated across all threads.

Options

No available options


SynchroTraceGen

Synopsis

$ bin/sigil2 --frontend=FRONTEND --backend=stgen OPTIONS --executable=mybinary -myoptions

Description

SynchroTraceGen is a frontend for generating trace files for the SynchroTrace simulation framework.

Each thread detected by SynchroTraceGen is given its own output trace file, named sigil.events-#.out. By default, the output is directly compressed since the trace files can grow very large.

Options

-c NUMBER
Default: 100
Will compress all SynchroTraceGen compute events.
Each compute event will have a maximum of NUMBER local reads or writes

-o PATH
Default: ‘.’
All SynchroTraceGen output will be put in PATH

-l {text,capnp,null}
Default: ‘text’
Choose which logging framework to use.
Regardless of which logger is chosen, a sigil.pthread.out and sigil.stats.out
file will be output.
‘text’ will output an ASCII formatted trace in gzipped files.
‘capnp’ will output a packed CapnProto serialized trace in gzipped files.
‘null’ will not output anything.

Frontend Documentation

Each frontend generates one or more event streams to a Sigil2 backend analysis tool. Each frontend has it’s own internal representation (IR) of events, so the process of converting frontend IR to Sigil2 event primitives is different for each frontend. For example, Valgrind will disassemble each machine instruction into multiple VEX IR statements and expressions; DynamoRIO annotates each instruction in a basic block with specific attributes; the current Perf frontend only supports x86_64 decoding via the Intel XED library.

Valgrind

Synopsis

$ bin/sigil2 --frontend=valgrind OPTIONS --backend=BACKEND --executable=mybinary -myoptions

Description

Uses a heavily modified Callgrind tool, Sigrind, to observe Prism event primitives and pass them to the backend. Valgrind serializes all threads in the target executable, so only one thread’s event stream is passed to the backend at a time. A context switch is signaled with a Prism context event. Because threads are serialized by Valgrind, the target executable is mostly deterministic.

Options

–at-func=FUNCTION_NAME
Default: (NULL)

–start-func=FUNCTION_NAME
Default: (NULL)
Start collecting events at FUNCTION_NAME
If (NULL), then start from beginning of execution

–stop-func=FUNCTION_NAME
Default: (NULL)
Stop collecting events at FUNCTION_NAME
If (NULL), then stop at the end of execution

–gen-mem={yes,no}
Default: yes
Generate memory events to Sigil2

–gen-comp={yes,no}
Default: yes
Generate compute events to Sigil2

–gen-cf={yes,no}
Default: no
Currently unsupported

–gen-sync={yes,no}
Default: yes
Generate synchronization (thread) events to Sigil2

–gen-instr={yes,no}
Default: yes
Generate ISA instructions to Sigil2
Only instruction addresses are currently supported

–gen-bb={yes,no}
Default: no
Currently unsupported

–gen-fn={yes,no}
Default: no
Sends function enter/exit events along with the function name
Be sure to compile with less optimizations and debug flags for best results

Multithreaded Application Support

The Valgrind frontend automatically supports synchronization events in applications that use the POSIX threads library and/or the OpenMP library by intercepting relevant API calls.

Pthreads

Pthreads should be supported for most versions of GCC/libc, because the Pthread API is quite stable.

Pthreads support exists for any application dynamically linked to the Pthreads library.

See Static Library Support for applicatons that are statically linked.

OpenMP

Only GCC 4.9.2 is officially supported for synchronization event capture, because the implementation of the library is more likely to change between GCC versions.

Dynamically linked OpenMP applications are not supported. Only Static Library Support exists.

Static Library Support

Applications that use a static Pthreads or OpenMP library must be manually linked with the sigil2-valgrind wrapper archive. This can be found in BUILD_DIR/bin/libsglwrapper.a.

For example:

$CC $CFLAGS main.c -Wl,--whole-archive $BUILD_DIR/bin/libsglwrapper.a -Wl,--no-whole-archive

DynamoRIO

Synopsis

$ bin/sigil2 --num-threads=N --frontend=dynamorio OPTIONS --backend=BACKEND --executable=mybinary -myoptions

Description

Note

-DDYNAMORIO_ENABLE=ON must be passed to cmake during configuration to build with DynamoRIO support.

DynamoRIO is a cross-platform dynamic binary instrumentation tool. DynamoRIO runs multithreaded applications natively. This makes results less reproducible than Valgrind, however analysis is potentially faster on a multi-core architecture. This enables multiple event streams to be processed at once, by setting –num-threads > 1.

Options

Todo

options

--num-threads=N

Intel Process Trace

Synopsis

$ bin/sigil2 --frontend=perf --backend=BACKEND --executable=perf.data

Description

Note

-DPERF_ENABLE=ON must be passed to cmake during configuration to build with Perf PT support.

Intel Process Trace is a new CPU feature available on Intel processors that are Broadwell or more recent. The trace is captured via branch results. The entire trace is then reconstructed by perf by replaying the binary, including all shared library loading and context switches. A side effect of only capturing branch results is that all runtime information within the trace is lost, such as some memory access addresses; e.g. the Perf ‘replay’ mechanism does not support replaying malloc results.

For more usage details, see: perf design document for Intel PT

For more technical details see: Intel Software Developer’s Manual Volume Three

Options

Note

The perf.data file is generated with: perf record -e intel_pt//u ./myexec

If you receive ‘AUX data lost N times out of M!‘, try increasing the size of the AUX buffer. Otherwise a significant of the portion of the trace may not be reproduced: perf record -m,AUXTRACE_PAGES -e intel_pt//u ./myexec

Todo

options


About

Prism comes from Drexel University’s VLSI & Architecture Lab (VANDAL), headed by Dr. Baris Taskin and in collaboration with Tufts University’s Dr. Mark Hempstead.

The goal of Prism is modular application analysis. It was formed from the need to support multiple projects that study application traces, aimed at data-driven architecture design. This has included early hardware accelerator co-design [SIGIL], as well as uncore design space exploration with multi-threaded workloads [SYNCHROTRACE] [UNCORERPD]. Prism is not interested in changing the functional behavior of an application, but instead aims to classify events in the application and present those events for further analysis. In this way, Prism does not require that each researcher have an in depth understanding of the binary instrumentation tools.


[SIGIL]S. Nilakantan and M. Hempstead, “Platform-independent analysis of function-level communication in workloads”, 2013 IEEE International Symposium on Workload Characterization (IISWC), pp. 196 - 206, 2013.
[SYNCHROTRACE]S. Nilakantan, K. Sangaiah, A. More, G. Salvadory, B. Taskin and M. Hempstead, “Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation”, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 278 - 287, 2015.
[UNCORERPD]K. Sangaiah, M. Hempstead and B. Taskin, “Uncore RPD: Rapid design space exploration of the uncore via regression modeling”, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 365 - 372, 2015.

Changelog

Versioning is generally based on semantic versioning.

1.0.0 (2018-4-9)

User Notes

Initial release for our ISPASS‘18 publication.

Events

Control Flow events are not currently implemented, although a provisional interface is provided in Events Documentation.

Frontends

Valgrind is fairly well supported. It can be slow at event generation, although a faster version is in the works.

DynamoRIO is less well supported, but should generate basic memory, compute, and synchronization events fairly well. This is planned to be updated.

The Intel PT perf frontend was implemented as a proof-of-concept and there is a lot of room for increased event support and optimization.

Developer Notes

A new Valgrind implementation (gengrind) is close to being completed. Currently we are working on how best to implement branching with VEX IR, which seems to only support limited branching. The function tracking component of gengrind is based off of Callgrind. A few bugs may be present in this new function tracking since we stripped away some of the Callgrind specific functionality, such as cost-centers and cache simulations.

The DynamoRIO frontend requires some extra event checks to make sure the raw instructions it sees are properly binned. Additionally, some restrictions in its internal lock implementations make detecting and generating synchronization events more costly than we would expect. Specifically, we cannot directly generate events in function intercepts, and must instead set a flag that gets checked in every basic block. Also, we plan to look into thread-private code-caches to optimize ROI checks that happen at the beginning of each basic block.

Features

  • Flexible application analysis
    • Use multiple frontends for capturing software workloads like Valgrind and DynamoRIO
    • Use custom C++14 libraries for analyzing event streams
  • Platform-independent events
    • Straight-forward and extensible format, simplifying analysis

Installation

See the Quickstart for information installation instructions.

License

This project is licensed under the BSD3 license.