Visual analysis best practices

June 14th, 2018, Published in Articles: PositionIT, Featured: PositionIT

This article outlines simple techniques for making data visualisation useful and aesthetically pleasing. It does not provide instructions for building visualisations, but provides tips for making visualisations more effective.

The difference between a good visualisation and a great one is that a great visualisation helps you understand more about the data more quickly. There are many ways to do this.

Start with questions

The single most important step to make a great visualisation is to know what you are trying to say. With the amount of data available, anyone can get lost in a world of scatter plots and geocoding. It is vital that the visualisation has a purpose and to be selective about what to include in the visualisation to fulfil that purpose.

To know if a visualisation has a purpose, ask: Who is the audience? What questions do they have? What answers does the visualisation offer? What other questions does it inspire? What conversations will result? Viewers should take something away from the time they spend with a visualisation.

Suppose you work for a stock broker who focuses on IPO investments, and you want to make a visualisation to help him decide where to invest. You might ask a question like “Does profitability at IPO affect stock performance?” That might lead you to produce a visualisation such as the one in Fig. 1a. From this view, profitability at IPO has a huge effect on its later performance. However, this dataset contains information on all software company IPOs over the past three decades. You might wonder if the trend you’ve discovered holds true throughout all periods. Another view, such as in Fig. 1b, could better help answer that question.

Fig. 1: Different visualisations of the same data help answer different questions.

Fig. 1: Different visualisations of the same data help answer different questions.

You can see from this view that the trend only applies to the 1990s. Furthermore, you can now see that all companies were profitable at IPO in the 1980s, and profitability at IPO did not have a huge impact on stock performance in the 2000s. Does this mean that modern investors are more risk-prone than their predecessors? Or do companies that were not profitable at IPO have equal likelihood of future success as those that are profitable? One can explore this further to find out.

Choose the right chart type

Next, think about what types of analysis will help achieve a visualisation’s purpose.

Analysing trends

One of the most frequently used methods for analysing data is to track a trend over time. Some of the best visualisations for showing trends are line charts, area charts and bar charts. Fig. 2a shows trends, by sector, in the flow of venture-financing funds. It is conventional to put time on the X-axis and the measure on the Y-axis. In this line chart the year is on the X-axis, funding amount on the Y-axis, and the sector type encoded with colour. From this view all sectors follow the same trend in funding over time, and the trends of each sector and the differences between them are visible.

Fig. 2a: Trends over time, by sector, in the flow of venture-financing funds.

Fig. 2a: Trends over time, by sector, in the flow of venture-financing funds.

But what about the overall funding trend – exactly how much funding is there for all sectors in 2000 or any other point in time? Line charts do not have the capability to show this. However, if answering these questions are important, area charts or bar charts are useful (Fig. 2b and 2c) as they amplify total funding trends and how each individual sector contributes to the total over time. However, they do have a distinct difference: The area chart treats each sector as a single pattern while the bar chart focuses on each year as a single pattern.

Fig. 2b and 2c: The area chart treats each sector as a single pattern while the bar chart focuses on each year as a single pattern.

Fig. 2b and 2c: The area chart treats each sector as a single pattern while the bar chart focuses on each year as a single pattern.

Comparison and ranking

Another method for analysing data is by comparison and ranking (Fig. 3). We compare and rank countries, regions, business segments, salesmen and sports players based on one or a set of criteria. In many cases, this shows us where we are and how we are doing. A bar chart is great for comparison and ranking because it encodes quantitative values as length on the same baseline, making it easy to compare values.

Fig. 3: Analysing data is by comparison and ranking.

Fig. 3: Analysing data is by comparison and ranking.

Correlation

Looking for relationships between measures is something we do all the time in data analysis. Running a simple correlation analysis is a good place to start in identifying relationships between measures. Be mindful that correlation does not guarantee a relationship. Instead it only suggests a potential relationship. To confirm the relationship truly exists, a more sophisticated methodology is often required.

Fig. 4 is an example of a simple scatter plot to detect correlations between two factors. The data is from a Deli-food wholesale company. Sales price appears on the Y-axis, sales quantity on the X-axis, and it includes monthly sales numbers on details. There is a clear negative correlation between sales price and quantity – even clearer after adding a trend line. When price is high, quantity is low, and vice versa. Does this mean that the company should lower prices to boost sales? Not necessarily. This is why we overlay the net profit onto the size of the cycles. From this, it looks like the company makes the greatest profit on both ends.

Fig. 4: A simple scatter plot to detect correlations.

Fig. 4: A simple scatter plot to detect correlations.

Different chart types can be complimentary. Fig. 5 combines two line charts with a bar chart. By putting trend lines for sales price and quantity side-by-side at the top, the viewer’s focus is guided towards a comparison of these two trends. The negative correlation remains clear, while the net-profit bar chart below it provides additional information without interrupting the correlation analysis.

Fig. 5: Combining line charts with a bar chart can help compare trends.

Fig. 5: Combining line charts with a bar chart can help compare trends.

Distribution

Distribution analysis is useful in data analysis because it shows how quantities are distributed across a range. For example, a hospital might want to look at their distribution of patient treatment duration. Two common ways to do this is with a box plot and a histogram.

Box plots are useful for displaying multiple distributions. They pack all the data points – in this case, minutes per patients – into a box and whisker display (Fig. 6a). It makes it easy to simultaneously identify the low values, 25th-percentile values, the medians, the 75th-percentiles, and the maximum values across all categories. What really stands out from this box plot is that treatment length varies largely between patients in the Emergent and Non-Urgent categories because their boxes are much bigger. This could lead to further investigation by begging the question, why?

Distributions can also be displayed with a histogram (Fig. 6b). Instead of breaking up the data by Triage Acuity and plotting the time that each patient spends in each category, a histogram shows the data by time segments and counts the number of patients in each segment. This also shows that the peak (or most common) treatment length is 70 minutes. One can also colour the bars to show the patient count varies by Triage Acuity category. Doing this shows that there are patients in multiple categories in most of the time segments, and that “Urgent” and “Less Urgent” are the most common categories.

Fig. 6: Box plots are useful for displaying multiple distributions (a). Distributions can also be displayed with a histogram (b).

Fig. 6: Box plots are useful for displaying multiple distributions (a). Distributions can also be displayed with a histogram (b).

Proportions

There are occasions when you want to do a part-to-whole analysis. Although pie charts are commonly used in this type of situation, we suggest avoiding them for two reasons: firstly, because the human visual system is not very good at estimating area, and secondly, one can only compare slices that are right next to each other. For example, in the chart in Fig. 7a, can you tell which slice is the largest or how the Western region differs across age groups? The same data plotted on a percent-total bar chart (Fig. 7b), however, makes clear that the 25 to 40 age group in the Western region is the largest slice. It also shows the regional differences across age groups better.

Fig. 7: Percent-total bar chart (b) can be more effective than pie charts (a) to show proportions.

Fig. 7: Percent-total bar chart (b) can be more effective than pie charts (a) to show proportions.

Geographical data

When you want to show a location, use a map. Remember that maps are often best when paired with another chart that details what the map displays, such as a bar chart sorted from greatest to least, a line charts showing the trends, or even a cross-tab to show the actual data. Although Tableau does not recommend pie charts for depicting proportional relationship, they can be useful on maps, such as in the website-traffic map in Fig. 8. By using pie charts on the map, the viewer gets a rough breakdown view of each country, which can be very useful when complemented by other chart types like those already mentioned.

Fig. 8: Pie charts can be useful on maps for rough breakdown views.

Fig. 8: Pie charts can be useful on maps for rough breakdown views.

Fine tuning details

Even after choosing the best chart type(s) for an analysis, creating effective views requires effort, intuition, attention to detail – and trial and error.

Emphasise the most important data

Many chart types contain multiple measures and dimensions in one view. In scatter plots, for example, measures can be put on the X or Y-axis, as well as on the marks for colour, size or shape. Choosing where to put each measure depends on the kind of analysis and what you are trying to emphasise. A rule of thumb is to put the most important data on the X or Y-axis and illustrate the less important data with colour, size or shape.

Fig. 9a shows data for home buyers with the purpose to help them understand the relationship between home price, home size, lot size and the type of home they are interested in. The relationship between price and lot size is clear. But is this the most important information for home buyers? The relationship between price and home size probably takes precedence, which is why
Fig. 9b is more effective.

Fig. 9: Choosing what takes precedence make visualisations effective.

Fig. 9: Choosing what takes precedence make visualisations effective.

Orient and organise views for legibility

Simple changes can go a long way toward making visualisations easy to interact with. Fig. 10a is probably difficult to read because all of the labels are vertically oriented. A simple change to a horizontal orientation makes the chart easier to read and the comparisons clearer (Fig. 10b).

Fig. 10: A simple change to a horizontal orientation makes the chart easier to read and the comparisons clearer.

Fig. 10: A simple change to a horizontal orientation makes the chart easier to read and the comparisons clearer.

When evaluating a sales team by comparing their sales with their quotas, it seems intuitive to put the two measures side-by-side (Fig. 11a). In Fig. 11 for example, it is clear that Greg Powell is above his quota, but determining by how much compared to his colleagues is more difficult. Putting sales and quota data into rows instead (Fig. 11b) creates a shared baseline for the sales bar and the quota bar, which makes comparison even easier. Now we can see that Greg Powell is above quota, but only marginally. A bullet chart (Fig. 11c) might be even more effective, as it combines a bar chart with reference lines to create a visual comparison between actual and target numbers. In this instance, “actual” is sales (bars) and “target” is quota (vertical reference lines). Not only can one easily see how well each sales person is performing to his/her quota, but it also cuts down 50% of the bars.

Fig. 11: Putting data into rows (b) instead of next to each other (a) creates a shared baseline for easier comparison. A bullet chart (c) can be even more effective as it makes the comparison more succinct.

Fig. 11: Putting data into rows (b) instead of next to each other (a) creates a shared baseline for easier comparison. A bullet chart (c) can be even more effective as it makes the comparison more succinct.

Avoid data overload

Overloading a visualisation is one of the most common mistakes. Instead of stacking countries, departments and profit into one condensed view (Fig. 12a), break them down to small multiples (Fig. 12b) that are more legible and easier to understand.

Fig. 12: Overloading a visualisation is a common mistake (a); Break data down into small multiples that is more legible and easier to understand (b).

Fig. 12: Overloading a visualisation is a common mistake (a); Break data down into small multiples that is more legible and easier to understand (b).

Limit colours and shapes

Effective use of colour and shape can help emphasise patterns, but too many colours and shapes usually defeat that purpose. With many colours and lines clustered together it is almost impossible to distinguish values, let alone patterns. Using similar colours can have the same effect. This issue can be mitigated by choosing to emphasise another aspect or category of the data.

Design holistic dashboards

A dashboard is a collection of several related visualisations on a single page, usually tied together interactively. Dashboards increase the analytical power of a visualisation by showing multiple perspectives in the same location. They can also be used to combine multiple types of data in a single location.

When designing a dashboard, it is important to structure it in a way that is accessible to your audience. For example, Fig. 13 is a good example of an interactive dashboard that tells a single story. It is accessible because it guides the viewer sequentially through each important piece of the story: the crime locations, the day of the week and the crime frequency. In addition, it is hard to miss the interactive panel on the top right and the interactive instructions that are subtly embedded in the titles.

Fig. 13: An interactive dashboard that tells a single story.

Fig. 13: An interactive dashboard that tells a single story.

General guidelines

  • Place the most important visualisation at the top of the dashboard or in the upper left corner, as a viewer’s eyes are usually drawn to this corner first.
  • Structure visualisations with chained interactivity (i.e. first visualisation filters the next which filters the last view), from top to bottom and left to right.
  • Limit the number of visualisations in the dashboard to three or four.
  • Avoid using multiple colour schemes in a dashboard.
  • Try group multiple filters together with a layout container. A light border around them gives a subtle visual cue that they have shared features. The top right or left side of the dashboard are good areas to locate filters.
  • If a legend applies to all of the visualisations, place them together with filters. If a legend applies to one or more visualisations, place them as close together as possible.

User interactivity

Interactivity can either aid or obstruct visualisations and dashboards. Only use interactive views when it is necessary, such as when you need to guide a story, encourage user exploration or when there is too much detail to show all at once. Make sure your viewers know when they can interact with visualisations and understand where to look for the changes of their interactions. Subtle instructions such as “Select,” “Highlight” and “Click” can be useful.

Highlighting and filters

Highlights can quickly show relationships between values in a specific area or category, even across multiple views (Fig. 14). One of the best things about highlighting is that it preserves the context of the rest of the points (unlike filtering). Filters lets users drill down to a more detailed level and enable multi-level data exploration and user-driven data analysis (Fig. 15). If not used properly, filters have the potential to confuse users.

Fig. 14: Interactivity such as highlights show relationships between values in a specific area or category, even across multiple views.

Fig. 14: Interactivity such as highlights show relationships between values in a specific area or category, even across multiple views.

Hyperlinking

URL actions can link to information outside of a data source. To make the link relevant to your data, try using values of the data as parameters in the URL. For example, if you have a list of Twitter users that are encoded in your data as the field <username>, you can create a URL action that points to www.twitter.com/<username>.

Fig. 15: Filters enable multi-level data exploration and user-driven data analysis.

Fig. 15: Filters enable multi-level data exploration and user-driven data analysis.

Formatting
The formatting of visualisations can change everything about them.

Colour

Colour can make the difference between a boring visualisation and an inspiring one:

  • Try to use no more than two colour palettes. Make sure to use non-overlapping scales.
  • Select semantically meaningful colours if they apply to the context of the data. For example, in many cultures green is associated with positivity while red has a negative connotation. Consider whether any of the colours in it have alternate meanings that do not align with the message. When using colours that have an inherent meaning, make sure you have assigned them to relevant values.
  • Include a legend or labels where the colour choice is not obvious.
  • When using a diverging colour palette, the midpoint and end points should be meaningful. Zero is often a meaningful midpoint.
  • Avoid adding colour encoding to more than twelve distinct values.

Fonts and legibility

There are only a few fonts that one should use to optimise readability online: Trebuchet MS or Verdana (especially for tables and numbers), Arial, Georgia, Tahoma, Times New, Roman and Lucida Sans.

In addition, Calibri and Cambria are suitable for tooltips, but are not recommended for use in other parts of a visualisation. Also consider the colour of fonts. As a general rule, axes and labels should be dark grey (this keeps them from distracting viewers’ attention away from the visualisation). Try to keep to two or three font colours per page, and make sure the formatting is consistent. For instance, all of the filters should have the same style, as well as all of the titles, but filters and titles can have different styles.

Tooltips

Tooltips – the text boxes that pop up when users hover over an object – can make the difference between a user loving the visualisation and not understanding it.

Labelling

Mark labels (the labels on the data points) can help you tell a story quickly and succinctly. It is often much easier to read a mark label than to hover over a data point for its tooltip. You can label different aspects of a visualisation – be it selective aspects of the data, labelling the outliers by marking the minimum and maximum values, labelling highlights or labelling the end of lines.

Evaluate

Once the visualisation design is complete, it is time to take a step back and evaluate it. Ask yourself once more: are all the elements of in design working together well?

Acknowledgement

This article is adapted from Tableau Software’s whitepaper “Visual Analysis Best Practices: A Guidebook”.

Contact Angela Woodward, Tableau Software, awoodward@tableau.com