Data Visualisation

The Data Science Stack

The entire data science process, as defined by Ben Fry, consists of 3 main phases, with data visualisation as the end product.

_images/datasci_track.png

from Udacity’s data visualisation & d3.js course

  1. Computer Science: The first phase consists of the ETL process and database creation.
  2. Statistics and Data Mining: The second phase consists of exploratory analysis, model construction and validation.
  3. Graphic Design: The final phase consists of the creation and presentaion of data visualisations.

Two Types of Visualisations

Data visualisations can be broadly defined as exploratory and explanatory visualisations.

Exploratory visualisations allows the user to explore and play around with the data. It is usually done at phase 2 of data science process. It is a conversation between the data and oneself.

Explanatory visualisations has the aim to present to the end-user with a message to convey. It is the final phase of the data science process. It is a conversation between the data and the audience.

Visualisation Stack

Just like programming languages, data visualisation has a low level to high level grouping. High level visualisation tools are very often very easy to use but less flexible, while low level visualisation tools are difficult to learn and create, but has all the flexibility.

_images/vis_stack.png

from Udacity’s data visualisation & d3.js course

Visual Encoding

Visual encoding is the way in which data is constructed into visual structures. They are the building blocks of graphics.

Planar Encoding

Planar encoding is as simple as the laying of axis, like the x & y axis in a simple line chart.

Retinal Encoding

To represent data in 3 or more variables, retinal encoding comes into the picture. Size, texture, shape, orientation, color gradient and color hue are some examples.

Others

  • Time lapse
  • Scale adjustments
  • Slides based

Ranking of Encodings

People are able to interpret some visual encodings better than others. A study done showed that position (y, x axis) is the best, followed by length, angle and slope, area, volume, and lastly, color and density. Pay attention to these when constructing your visual.

_images/rankingencode.png

from Udacity’s data visualisation & d3.js course

Charts

The formula of a chart anatomy can be written as:

Visual Encodings + Data Types + Relationships = Chart Types

_images/small_multiples.png

from from Udacity’s data visualisation & d3.js course

Most Common Charts

It is important to choose charts that users can understand and recognise easily. The most common ones include Scatterplots, Barcharts, Linecharts and maps.

Maps

Choropleth = Map + Color

Cartogram = Map + Size

Dotmap = Map + Shape

Design Principles

Besides appropriate chart types, certain design principles can help to immediately show distinctive trends.

Attention Grabbers

Certain retinal encoding can be used as attention grabbers, also known as Pre-attentive Attributes.

_images/preattentive.png

from Udacity’s data visualisation & d3.js course. Note that movement is supposed to have a flickering effect for one of the data points.

Importance of Null Values

It may be useful to use values which are null or zero to highlight certain trends. This is known as Negative Space. An example is John Snow’s Chlorea map, with the Brewy showing no deaths from the disease.

_images/johnsnow.png

from Udacity’s data visualisation & d3.js course.

Keep it Simple

Probably one of the most important design principle which can be summarised as keeping it simple. The aim is to remove all clutter that are not helpful in conveying a message. Such clutter is known as Chart Junk. They only serve to distract users from the data. Some examples include:

  1. Excessive colors for categorical data
  2. Special effects like, 3D and shadows
  3. Too many labels
  4. Distinctive grid lines
  5. Fancy pictures or graphics

The Data-Ink Ratio is a useful concept used to quantify how much clutter is in a chart. Its formula is defined as:

_images/dataink.png

from Udacity’s data visualisation & d3.js course.

A high data-ink ratio means it is a simple and neat chart, while a low ratio signifies a complicated and cluttered chart.

Keep it Honest

It is important to preserve the integrity of the data in a visualisation. There are various ways charts can be manipulated to convey to the end-user a certain message by ‘cheating’. Here are some selected ones to that should be avoided:

  1. 3D effects distorts the scaling
  2. Stretching the proportion by not starting from 0.

Dashboard Design Principles

Designing dashboards is a different beast as it involves a combination of different visualisation that interact and tell a story in tandem.

Sisense’s Principles

Sisense provides 4 key principles of a good dashboard.

1. 5 Second Rule

Your dashboard should provide users with an immediate understanding of the data, as well as the relevant information in 5 seconds.

2. Logical Layout: Inverted Pyramid

Display the most significant insights on the top part of the dashboard, trends in the middle, and granular details in the bottom.

3. Minimalist

Do not put too many charts in a single dashboard. Note to have a low data-ink ratio.

4. The Right Visualisation

It is also essential to choose the appropriate charts for the purpose.


Chole’s Principles

In her book, Storytelling with Data, Chole provide a more detailed list of six principles. Note that her principles are not specific to dashboarding so not all portions are relevant, e.g., storytelling.

1. Understanding the Context

Have a good understand of who you are communicating to, what you need them to know, how you will communicate with them, and what data you have to back up your case.

2. Choose Appropriate Visualisation

Simple text is the best when highlighting a number or two. Know

3. Eliminate Clutter

As with above by Sisense.

4. Focus Attention on the Important Parts

Employ retinal encoding and preattentive attributes like color, size to signal what is important. Draw attention on where you want your audience to look and guide your audience. Evaluate the effectiveness by applying the “where are your eyes drawn?” test.

5. Think Like a Designer

Aesthetics, minimalist, and think how to engage the audience with visual cues.

6. Tell a Story

Craft a story with a clear beginning, middle, and end.

D3.JS

Data Driven Documents is a popular javascript library used for web data visualisations. It is created by Mike Bostock. See more from its github page.

Chains

The concept of D3 uses chaining to link sequential methods together. This makes it easy to apply multiple operations to the same element.

Scales

D3 scaling works similar to a normal x-y scale, with the important exception that the y-axis increase downwards instead of upwards.

_images/d3_scale.png

from Udacity’s data visualisation & d3.js course

To translate the actual scale into d3 scale, there is a need to map both together. Domain scale refers to the range of actual value, while Range scale refers to the D3 range of values.

// gapminder China graph
// note the inverse scale of y-axis
var y = d3.scale.linear().domain([15,90]).range([250,0]);
// for x-axis, it is a normal scale
var x = d3.scale.log().domain([250,1000]).range([0,600]);
// for circle sizing scale
var r = d3.scale.sqrt().domain([52070,138000000]).range([10,50])
// create an svg circle, with red fill, radius input, x and y inputs
svg.append('circle').attr('fill','red').attr('r',r(138000000)).attr('cx',x(1330)).attr('cy',y(77))

Typical D3 process

_images/d3_grammer.png

from Udacity’s data visualisation & d3.js course

We can review Mike Bostock, creator of D3.JS, guide to a simple bar chart from this link

launch a basic server using python

>>> python -m SimpleHTTPServer
Note the caps

From the Best

Facebook IPO

New York Times, link

Interesting Techniques:

  1. Rescaling of axis to reveal facebook.
  2. Using log scale to represent proportion increase
_images/facebook.png

Gay Rights in US

The Guardian, link1

Interesting Techniques:

  1. Good use of negative space to highlight no or grey gay rights in US
_images/gayrights.png

Effects of Vaccines

Wall Street Journal, link2

Interesting Techniques

  1. Easily meet the 5-sec rule
  2. Effective use of negative space
  3. Good use of colour for emphasis
_images/eg_vaccines.png

Portfolio Dashboard

Green Climate Fund link3

Interesting Techniques

  1. Use of simple charts
  2. Animation of charts to increase size from 0 to actual.
_images/greenclimate.png

Resources

Courses

1. Data Visualization and d3.js.: A free Udacity course that offers a good introduction on data visualisation and the popular javascript library D3.js. Many kingpins in the vis community are invited in this course.

Blogs

  1. storytellingwithdata
  2. flowingdata
  3. helpmeviz: A curated site where users commented on uploaded visuals

Books

  1. Story Telling with Data by Cole Nussbaumer Knaflic
  2. The Functional Art by Alberto Cairo