Data Visualisation¶
The Data Science Stack¶
The entire data science process, as defined by Ben Fry, consists of 3 main phases, with data visualisation as the end product.

from Udacity’s data visualisation & d3.js course
- Computer Science: The first phase consists of the ETL process and database creation.
- Statistics and Data Mining: The second phase consists of exploratory analysis, model construction and validation.
- Graphic Design: The final phase consists of the creation and presentaion of data visualisations.
Two Types of Visualisations¶
Data visualisations can be broadly defined as exploratory and explanatory visualisations.
Exploratory visualisations allows the user to explore and play around with the data. It is usually done at phase 2 of data science process. It is a conversation between the data and oneself.
Explanatory visualisations has the aim to present to the end-user with a message to convey. It is the final phase of the data science process. It is a conversation between the data and the audience.
Visualisation Stack¶
Just like programming languages, data visualisation has a low level to high level grouping. High level visualisation tools are very often very easy to use but less flexible, while low level visualisation tools are difficult to learn and create, but has all the flexibility.

from Udacity’s data visualisation & d3.js course
Visual Encoding¶
Visual encoding is the way in which data is constructed into visual structures. They are the building blocks of graphics.
Planar Encoding¶
Planar encoding is as simple as the laying of axis, like the x & y axis in a simple line chart.
Retinal Encoding¶
To represent data in 3 or more variables, retinal encoding comes into the picture. Size, texture, shape, orientation, color gradient and color hue are some examples.
Others¶
- Time lapse
- Scale adjustments
- Slides based
Ranking of Encodings¶
People are able to interpret some visual encodings better than others. A study done showed that position (y, x axis) is the best, followed by length, angle and slope, area, volume, and lastly, color and density. Pay attention to these when constructing your visual.

from Udacity’s data visualisation & d3.js course
Charts¶
The formula of a chart anatomy can be written as:
Visual Encodings + Data Types + Relationships = Chart Types

from from Udacity’s data visualisation & d3.js course
Most Common Charts¶
It is important to choose charts that users can understand and recognise easily. The most common ones include Scatterplots, Barcharts, Linecharts and maps.
Design Principles¶
Besides appropriate chart types, certain design principles can help to immediately show distinctive trends.
Attention Grabbers¶
Certain retinal encoding can be used as attention grabbers, also known as Pre-attentive Attributes.

from Udacity’s data visualisation & d3.js course. Note that movement is supposed to have a flickering effect for one of the data points.
Importance of Null Values¶
It may be useful to use values which are null or zero to highlight certain trends. This is known as Negative Space. An example is John Snow’s Chlorea map, with the Brewy showing no deaths from the disease.

from Udacity’s data visualisation & d3.js course.
Keep it Simple¶
Probably one of the most important design principle which can be summarised as keeping it simple. The aim is to remove all clutter that are not helpful in conveying a message. Such clutter is known as Chart Junk. They only serve to distract users from the data. Some examples include:
- Excessive colors for categorical data
- Special effects like, 3D and shadows
- Too many labels
- Distinctive grid lines
- Fancy pictures or graphics
The Data-Ink Ratio is a useful concept used to quantify how much clutter is in a chart. Its formula is defined as:

from Udacity’s data visualisation & d3.js course.
A high data-ink ratio means it is a simple and neat chart, while a low ratio signifies a complicated and cluttered chart.
Keep it Honest¶
It is important to preserve the integrity of the data in a visualisation. There are various ways charts can be manipulated to convey to the end-user a certain message by ‘cheating’. Here are some selected ones to that should be avoided:
- 3D effects distorts the scaling
- Stretching the proportion by not starting from 0.
Dashboard Design Principles¶
Designing dashboards is a different beast as it involves a combination of different visualisation that interact and tell a story in tandem.
Sisense’s Principles¶
Sisense provides 4 key principles of a good dashboard.
1. 5 Second Rule¶
Your dashboard should provide users with an immediate understanding of the data, as well as the relevant information in 5 seconds.
2. Logical Layout: Inverted Pyramid¶
Display the most significant insights on the top part of the dashboard, trends in the middle, and granular details in the bottom.
3. Minimalist¶
Do not put too many charts in a single dashboard. Note to have a low data-ink ratio.
Chole’s Principles¶
In her book, Storytelling with Data, Chole provide a more detailed list of six principles. Note that her principles are not specific to dashboarding so not all portions are relevant, e.g., storytelling.
1. Understanding the Context¶
Have a good understand of who you are communicating to, what you need them to know, how you will communicate with them, and what data you have to back up your case.
2. Choose Appropriate Visualisation¶
Simple text is the best when highlighting a number or two. Know
3. Eliminate Clutter¶
As with above by Sisense.
4. Focus Attention on the Important Parts¶
Employ retinal encoding and preattentive attributes like color, size to signal what is important. Draw attention on where you want your audience to look and guide your audience. Evaluate the effectiveness by applying the “where are your eyes drawn?” test.
5. Think Like a Designer¶
Aesthetics, minimalist, and think how to engage the audience with visual cues.
6. Tell a Story¶
Craft a story with a clear beginning, middle, and end.
D3.JS¶
Data Driven Documents is a popular javascript library used for web data visualisations. It is created by Mike Bostock. See more from its github page.
Chains¶
The concept of D3 uses chaining to link sequential methods together. This makes it easy to apply multiple operations to the same element.
Scales¶
D3 scaling works similar to a normal x-y scale, with the important exception that the y-axis increase downwards instead of upwards.

from Udacity’s data visualisation & d3.js course
To translate the actual scale into d3 scale, there is a need to map both together. Domain scale refers to the range of actual value, while Range scale refers to the D3 range of values.
// gapminder China graph
// note the inverse scale of y-axis
var y = d3.scale.linear().domain([15,90]).range([250,0]);
// for x-axis, it is a normal scale
var x = d3.scale.log().domain([250,1000]).range([0,600]);
// for circle sizing scale
var r = d3.scale.sqrt().domain([52070,138000000]).range([10,50])
// create an svg circle, with red fill, radius input, x and y inputs
svg.append('circle').attr('fill','red').attr('r',r(138000000)).attr('cx',x(1330)).attr('cy',y(77))
From the Best¶
Facebook IPO¶
New York Times, link
Interesting Techniques:
- Rescaling of axis to reveal facebook.
- Using log scale to represent proportion increase

Gay Rights in US¶
The Guardian, link1
Interesting Techniques:
- Good use of negative space to highlight no or grey gay rights in US

Resources¶
Courses¶
1. Data Visualization and d3.js.: A free Udacity course that offers a good introduction on data visualisation and the popular javascript library D3.js. Many kingpins in the vis community are invited in this course.
Blogs¶
- storytellingwithdata
- flowingdata
- helpmeviz: A curated site where users commented on uploaded visuals
Books¶
- Story Telling with Data by Cole Nussbaumer Knaflic
- The Functional Art by Alberto Cairo