InfraScraper Documentation¶
Application Overview¶
Installation¶
PIP Installation¶
Release version of infra-scraper is currently available on Pypi, to install it, simply execute:
pip install infra-scraper
Instalation from Source¶
To bootstrap latest development version into virtualenv, run following commands:
git clone git@github.com:cznewt/infra-scraper.git
cd infra-scraper
virtualenv venv
source venv/bin/activate
python setup.py install
Configuration¶
You provide one configuration file for all providers. The default location is
/etc/infra-scraper/config.yaml
but it can be overriden by
INFRA_SCRAPER_CONFIG_PATH
environmental variable, for example:
export INFRA_SCRAPER_CONFIG_PATH=~/scraper.yml
Configuration in ETCD¶
You can use ETCD as a storage backend for the configuration and scrape results. Following environmental parameters need to be set:
export INFRA_SCRAPER_CONFIG_BACKEND=etcd
export INFRA_SCRAPER_CONFIG_PATH=/service/scraper/config
Storage Configuration¶
You can set you local filesystem path where scraped data will be saved.
storage:
backend: localfs
path: /tmp/scraper
endpoints: {}
You can also set the scraping storage backend to use the ETCD service instead of a local filesystem backend.
storage:
backend: etcd
path: /scraper
endpoints: {}
Endpoints Configuration¶
Each endpoint kind expects a little different set of configuration. Look at individual chapters for samples of required parameters to setup individual endpoints.
Usage¶
The application comes with several entry commands:
Scraping Commands¶
scraper_get <endpoint-name>
Scrape single endpoint once.
scraper_get_forever <endpoint-name>
Scrape single endpoint continuously.
scraper_get_all
Scrape all defined endpoints once.
scraper_get_all_forever
Scrape all defined endpoints continuously.
UI and Utility Commands¶
scraper_status
Display the service status, endpoints, scrapes, etc.
scraper_web
Start the UI with visualization samples and API that provides the scraped data.
Supported Platforms¶
Amazon Web Services¶
AWS scraping uses boto3
high level AWS python SDK for accessing and
manipulating AWS resources.
endpoints:
aws-us-west-2-admin:
kind: aws
config:
region: us-west-2
aws_access_key_id: <access_key_id>
aws_secret_access_key: <secret_access_key>
Kubernetes Clusters¶
Kubernetes requires some information from kubeconfig
file. You provide the
parameters of the cluster and the user to the scraper. These can be found
under corresponding keys in the kubernetes configuration file.
endpoints:
k8s-admin:
kind: kubernetes
layouts:
- force
- hive
config:
cluster:
server: https://kubernetes-api:443
certificate-authority-data: |
<ca-for-server-and-clients>
user:
client-certificate-data: |
<client-cert-public>
client-key-data: |
<client-cert-private>
OpenStack Clouds¶
Configurations for keystone v2 and keystone v3 clouds. Config for single tenant scraping.
endpoints:
os-v2-admin:
kind: openstack
scope: local
layouts:
- hive
config:
region_name: RegionOne
auth:
username: admin
password: password
project_name: admin
auth_url: https://keystone-api:5000/v2.0
Config for scraping resources from entire cloud.
endpoints:
os-v2-admin:
kind: openstack
scope: global
layouts:
- hive
config:
region_name: RegionOne
auth:
username: admin
password: password
project_name: admin
auth_url: https://keystone-api:5000/v2.0
SaltStack Infrastructures¶
Configuration for connecting to Salt API.
endpoints:
salt-global:
kind: salt
layouts:
- hive
config:
auth_url: http://127.0.0.1:8000
username: salt-user
password: password
TerraForm Templates¶
Configuration for parsing terraform templates.
endpoints:
tf-aws-app:
kind: terraform
layouts:
- hive
config:
dir: ~/terraform/two-tier-aws
Visualization Layouts¶
Diagrams are symbolic representation of information according to some visualization technique. Diagrams have been used since ancient times, but became more prevalent during the Enlightenment. Sometimes, the technique uses a three-dimensional visualization which is then projected onto a two- dimensional surface. The word graph is sometimes used as a synonym for diagram.
Arc Diagram¶
An arc diagram is a style of graph drawing, in which the vertices of a graph are placed along a line in the Euclidean plane, with edges being drawn as semicircles in one of the two halfplanes bounded by the line, or as smooth curves formed by sequences of semicircles. In some cases, line segments of the line itself are also allowed as edges, as long as they connect only vertices that are consecutive along the line.
The use of the phrase “arc diagram” for this kind of drawings follows the use of a similar type of diagram by Wattenberg (2002) to visualize the repetition patterns in strings, by using arcs to connect pairs of equal substrings. However, this style of graph drawing is much older than its name, dating back to the work of Saaty (1964) and Nicholson (1968), who used arc diagrams to study crossing numbers of graphs. An older but less frequently used name for arc diagrams is linear embeddings.
Heer, Bostock & Ogievetsky wrote that arc diagrams “may not convey the overall structure of the graph as effectively as a two-dimensional layout”, but that their layout makes it easy to display multivariate data associated with the vertices of the graph.
Sample Visualizations¶
Hierarchical Edge Bundling¶
A compound graph is a frequently encountered type of data set. Relations are given between items, and a hierarchy is defined on the items as well. Hierarchical Edge Bundling is a new method for visualizing such compound graphs. Our approach is based on visually bundling the adjacency edges, i.e., non-hierarchical edges, together. We realize this as follows. We assume that the hierarchy is shown via a standard tree visualization method. Next, we bend each adjacency edge, modeled as a B-spline curve, toward the polyline defined by the path via the inclusion edges from one node to another. This hierarchical bundling reduces visual clutter and also visualizes implicit adjacency edges between parent nodes that are the result of explicit adjacency edges between their respective child nodes. Furthermore, hierarchical edge bundling is a generic method which can be used in conjunction with existing tree visualization techniques.
Sample Visualizations¶
Force-Directed Graph¶
Force-directed graph drawing algorithms are used for drawing graphs in an aesthetically pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.
While graph drawing can be a difficult problem, force-directed algorithms, being physical simulations, usually require no special knowledge about graph theory such as planarity.
Good-quality results can be achieved for graphs of medium size (up to 50–500 vertices), the results obtained have usually very good results based on the following criteria: uniform edge length, uniform vertex distribution and showing symmetry. This last criterion is among the most important ones and is hard to achieve with any other type of algorithm.
Sample Visualizations¶
More Information¶
- https://en.wikipedia.org/wiki/Force-directed_graph_drawing
- https://bl.ocks.org/shimizu/e6209de87cdddde38dadbb746feaf3a3 (shimizu’s D3 v4 - force layout)
- https://bl.ocks.org/mbostock/3750558 (Mike Bostock’s Sticky Force Layout)
- https://bl.ocks.org/emeeks/302096884d5fbc1817062492605b50dd (D3v4 Constraint-Based Layout)
Hive Plot¶
The hive plot is a visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes — this mapping is based on network structural properties. Edges are drawn as curved links. Simple and interpretable.
The purpose of the hive plot is to establish a new baseline for visualization of large networks — a method that is both general and tunable and useful as a starting point in visually exploring network structure.
Sample Visualizations¶
More Information¶
Adjacency Matrix¶
An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric. The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory.
The adjacency matrix should be distinguished from the incidence matrix for a graph, a different matrix representation whose elements indicate whether vertex–edge pairs are incident or not, and degree matrix which contains information about the degree of each vertex.
More Information¶
- https://github.com/micahstubbs/d3-adjacency-matrix-layout
- https://bl.ocks.org/micahstubbs/7f360cc66abfa28b400b96bc75b8984e (Micah Stubbs’s adjacency matrix layout)
- https://en.wikipedia.org/wiki/Adjacency_matrix