InfraScraper Documentation

Application Overview

Installation

PIP Installation

Release version of infra-scraper is currently available on Pypi, to install it, simply execute:

pip install infra-scraper

Instalation from Source

To bootstrap latest development version into virtualenv, run following commands:

git clone git@github.com:cznewt/infra-scraper.git
cd infra-scraper
virtualenv venv
source venv/bin/activate
python setup.py install

Configuration

You provide one configuration file for all providers. The default location is /etc/infra-scraper/config.yaml but it can be overriden by INFRA_SCRAPER_CONFIG_PATH environmental variable, for example:

export INFRA_SCRAPER_CONFIG_PATH=~/scraper.yml

Configuration in ETCD

You can use ETCD as a storage backend for the configuration and scrape results. Following environmental parameters need to be set:

export INFRA_SCRAPER_CONFIG_BACKEND=etcd
export INFRA_SCRAPER_CONFIG_PATH=/service/scraper/config

Storage Configuration

You can set you local filesystem path where scraped data will be saved.

storage:
  backend: localfs
  path: /tmp/scraper
endpoints: {}

You can also set the scraping storage backend to use the ETCD service instead of a local filesystem backend.

storage:
  backend: etcd
  path: /scraper
endpoints: {}

Endpoints Configuration

Each endpoint kind expects a little different set of configuration. Look at individual chapters for samples of required parameters to setup individual endpoints.

Usage

The application comes with several entry commands:

Scraping Commands

scraper_get <endpoint-name>

Scrape single endpoint once.

scraper_get_forever <endpoint-name>

Scrape single endpoint continuously.

scraper_get_all

Scrape all defined endpoints once.

scraper_get_all_forever

Scrape all defined endpoints continuously.

UI and Utility Commands

scraper_status

Display the service status, endpoints, scrapes, etc.

scraper_web

Start the UI with visualization samples and API that provides the scraped data.

Supported Platforms

Amazon Web Services

AWS scraping uses boto3 high level AWS python SDK for accessing and manipulating AWS resources.

endpoints:
  aws-us-west-2-admin:
    kind: aws
    config:
      region: us-west-2
      aws_access_key_id: <access_key_id>
      aws_secret_access_key: <secret_access_key>

Kubernetes Clusters

Kubernetes requires some information from kubeconfig file. You provide the parameters of the cluster and the user to the scraper. These can be found under corresponding keys in the kubernetes configuration file.

endpoints:
  k8s-admin:
    kind: kubernetes
    layouts:
    - force
    - hive
    config:
      cluster:
        server: https://kubernetes-api:443
        certificate-authority-data: |
          <ca-for-server-and-clients>
      user:
        client-certificate-data: |
          <client-cert-public>
        client-key-data: |
          <client-cert-private>

OpenStack Clouds

Configurations for keystone v2 and keystone v3 clouds. Config for single tenant scraping.

endpoints:
  os-v2-admin:
    kind: openstack
    scope: local
    layouts:
    - hive
    config:
      region_name: RegionOne
      auth:
        username: admin
        password: password
        project_name: admin
        auth_url:  https://keystone-api:5000/v2.0

Config for scraping resources from entire cloud.

endpoints:
  os-v2-admin:
    kind: openstack
    scope: global
    layouts:
    - hive
    config:
      region_name: RegionOne
      auth:
        username: admin
        password: password
        project_name: admin
        auth_url:  https://keystone-api:5000/v2.0

SaltStack Infrastructures

Configuration for connecting to Salt API.

endpoints:
  salt-global:
    kind: salt
    layouts:
    - hive
    config:
      auth_url: http://127.0.0.1:8000
      username: salt-user
      password: password

TerraForm Templates

Configuration for parsing terraform templates.

endpoints:
  tf-aws-app:
    kind: terraform
    layouts:
    - hive
    config:
      dir: ~/terraform/two-tier-aws

Visualization Layouts

Diagrams are symbolic representation of information according to some visualization technique. Diagrams have been used since ancient times, but became more prevalent during the Enlightenment. Sometimes, the technique uses a three-dimensional visualization which is then projected onto a two- dimensional surface. The word graph is sometimes used as a synonym for diagram.

Arc Diagram

An arc diagram is a style of graph drawing, in which the vertices of a graph are placed along a line in the Euclidean plane, with edges being drawn as semicircles in one of the two halfplanes bounded by the line, or as smooth curves formed by sequences of semicircles. In some cases, line segments of the line itself are also allowed as edges, as long as they connect only vertices that are consecutive along the line.

The use of the phrase “arc diagram” for this kind of drawings follows the use of a similar type of diagram by Wattenberg (2002) to visualize the repetition patterns in strings, by using arcs to connect pairs of equal substrings. However, this style of graph drawing is much older than its name, dating back to the work of Saaty (1964) and Nicholson (1968), who used arc diagrams to study crossing numbers of graphs. An older but less frequently used name for arc diagrams is linear embeddings.

Heer, Bostock & Ogievetsky wrote that arc diagrams “may not convey the overall structure of the graph as effectively as a two-dimensional layout”, but that their layout makes it easy to display multivariate data associated with the vertices of the graph.

Sample Visualizations

_images/arc-diagram.png

Arc diagram of OpenStack project’s resources (cca 100 nodes)

Hierarchical Edge Bundling

A compound graph is a frequently encountered type of data set. Relations are given between items, and a hierarchy is defined on the items as well. Hierarchical Edge Bundling is a new method for visualizing such compound graphs. Our approach is based on visually bundling the adjacency edges, i.e., non-hierarchical edges, together. We realize this as follows. We assume that the hierarchy is shown via a standard tree visualization method. Next, we bend each adjacency edge, modeled as a B-spline curve, toward the polyline defined by the path via the inclusion edges from one node to another. This hierarchical bundling reduces visual clutter and also visualizes implicit adjacency edges between parent nodes that are the result of explicit adjacency edges between their respective child nodes. Furthermore, hierarchical edge bundling is a generic method which can be used in conjunction with existing tree visualization techniques.

Sample Visualizations

_images/hiearchical-edge-bundling.png

Hierarchical edge bundling of SaltStack services and their relations (cca 100 nodes)

Force-Directed Graph

Force-directed graph drawing algorithms are used for drawing graphs in an aesthetically pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.

While graph drawing can be a difficult problem, force-directed algorithms, being physical simulations, usually require no special knowledge about graph theory such as planarity.

Good-quality results can be achieved for graphs of medium size (up to 50–500 vertices), the results obtained have usually very good results based on the following criteria: uniform edge length, uniform vertex distribution and showing symmetry. This last criterion is among the most important ones and is hard to achieve with any other type of algorithm.

Sample Visualizations

_images/force-directed-plot.png

Force-directed plot of all OpenStack resources (cca 3000 nodes)

Hive Plot

The hive plot is a visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes — this mapping is based on network structural properties. Edges are drawn as curved links. Simple and interpretable.

The purpose of the hive plot is to establish a new baseline for visualization of large networks — a method that is both general and tunable and useful as a starting point in visually exploring network structure.

Sample Visualizations

_images/hive-plot.png

Hive plot of all OpenStack resources (cca 3000 nodes)

Adjacency Matrix

An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric. The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory.

The adjacency matrix should be distinguished from the incidence matrix for a graph, a different matrix representation whose elements indicate whether vertex–edge pairs are incident or not, and degree matrix which contains information about the degree of each vertex.

_images/adjacency-matrix.png

Adjacency matrix of OpenStack project’s resources (cca 100 nodes)

Indices and tables