Graphical Integrity and Redesign

last updated 12/10/19

Principles of Graphical Integrity

These principles are attributed to Edward Tufte.

These are examples of excellence and problems as well as design goal summaries.

Most visualization tools today demonstrate good design, but they can be abused, so these principles should be understood.

Integrity

Tell the truth about the data -- above all else show the data

Show data variation, not design variation.

Make large data sets coherent

Reveal the data at several levels of detail from broad overview to the fine structure


False graphics

Look closely at the baselines of the three charts.

Mines report

What principles are violated?

 

Another similar graphic found October, 2008:

Vanderbilt financial aid

What principles are violated?

 

Numbers have magnitude and close values should be reflected.

Insane per capita in PA


Perceptions of area

Perception of area versus magnitude varies per person. The perceived area of a circle grows more slowly than the actual:

perceived area = (actual area)(.8±.3) So if the area is 4, the range of perceived area ranges from 2 =4(.5) to 6 =4(1.1) and if the circle's area grows to 8 then the perceived area could be from 3 =8(.5) to 15 =8(1.1). That is, if the area doubled, some might only see a 50% increase where others might see a 150% increase in the same visual.

Using two dimensional objects to represent scalars is naturally misleading-- especially if you use the diameter as proportional to the scalars to view. Growth of a scalar value becomes perceived to be squared. Be sure the areas are what you really represent if you use such objects.

Circles and area

Top row: number is proportional to the diameter
Bottom row: number is proportional to the area


Lie Factor

Lie factor = size of effect shown in graphic / size of effect in data

Fuel Economy

So there's a 53% increase in fuel economy, but the line drawn has a 783% increase. The lie factor is 783/53 = 14.8.

The lie factor ought to be around 1. ~1 = truth.


 

Design and Data Variation

there should be consistency across the entire graphic. Expections are set where you start looking.

Bad example from NY Times.

OPEC Oil Prices

Keep the axis the same.


Visual Area and Numerical Measure

Consistency!

OPEC barrel prices

Lie factor is 59.4, considering the volume.

 


Chart Junk: Vibration

Vibration effect

too much texture and vibration in this graphic -- a moire vibration

Overuse of texture in a graphic

some experts say it's eye catching, therefore good.


Chart Junk: Illusions

An unintentional Necker Illusion-- the back planes optically flip to the front.

Necker Illusion


More Junk

Easy cross hatching as in these samples

hatches examples


Grid Junk

The background grid is generally classified as junk.

Train schedule in France (one of Tufte's book cover).

Train schedule in France 1880

 

Here's the France train schedule with lightened grid lines.

France train schedule with better grid lines

 


Data-Ink Ratio

...ink = non-white data-pixels. We want a high ratio of data presentation to the pixel/ink used. (White or background pixels are not counted.)


Data-ink ratio =

 

data-ink (pixels used directly for data)
total ink used in graphic (total non-background pixels)

= proportion of ink devoted to the non-redundant display of information

= 1 - proportion of graphic that can be erased without loss of information

Want something close to 1, such as the electroencephalogram. Every pixel represents data, except labels

Electroencephologram

Good ratio:

Size and cycle

Bad ratio, prediction of voter registration:

Voter registration prediction

Better representation, same data as above.

Voter reg predition--better


Erasing Principles

Tufte principles: maximize the data-ink ratio, and erase non-data ink, within reason.

Some ink needs to be used for labeling and explanation. Bar chart element redundancy

 

Redundant data-ink erasure

A shaded, vertical, labeled bar chart displays the number up to 6 times:

  1. left line length
  2. right line length
  3. shaded area
  4. top line position
  5. position of the number
  6. the value of the number

An erasure example

Erasure example

However, sometimes redundancy is useful.

Consider the repeating of the train schedule (the red line indicates the repeated portion). Here the wrap around helps the reader see the continuum from late night through morning.

France train sched. repeated

 


Double functioning labels

Minimizing the amount of ink or pixels uses the pre-attentive capabilities of our eyes.

The left hand scale in this example serves as the y-axis and the points on the graph serve as the x-axis scale as well as the curve.

axes from the data

Example: a stem and leaf graph where the points themselves again are data.

Stem and leaf of volcano heights

With a stem and leaf, you get a histogram, and finding medians, quartiles, or percentiles is quickly possible.

A stem and leaf is a good hand drawn graphic. This works when the digits are fixed width because the line length has meaning of quantity.

This particular graphic may be better drawn, high to low to mimic a typical y-axis orientation.


Keys, Labels and Legends

Devoid of marks:

no markswith marks

Varying degress of marks

excessive marksmoderate marksminimal marks

Grid spacings: illogical and logical

illogical grid spacinglogical grid spacing

Data ranges: illogical and logical

illogical data rangelogical data range

Selecting and modifying views (interaction--later)

Iris example visualizations from XmdvTool (scatterplot matrix, star plot, and parallel coordinates plot)

iris view 1iris view 2iris view 3

 


Use color with care

too many colorsmoderate number of colors

changing huechinging both hue and saturation


Misleading visualizations

raw data; no correlation scrubbed data with apparant correlation

perspective distorts

baseline 1baseline 2

 

 


Hierarchy in graphics

A graphic should have at least three levels of viewing:

  1. what is seen at a distance-- seeing the overall picture, summaries, etc.
  2. what is seen in the details -- individual data points and tight relationships
  3. what is seen implicitly -- the information that may be surprising or not otherwise revealed

Consider the population densities from "http://upload.wikimedia.org/wikipedia/commons/thumb/9/90/USA-2000-population-density.gif/450px-USA-2000-population-density.gif"

population densities 2005

Germany population map


Simple hierarchical graph

Another graphic showing the GDP percentage over a decade for various countries.

Read what do you see

gdp trends 70s


Relational graphics

Here you remove the time series from the axis and attempt to see the relationship of two other dimensions.

Consider the inflation rate versus unemployment rates relationship with the time axis (z) projected or collapsed.

Inflation rate and unemployement rate compared

There are some comparison issues. What can you note?

 


Time-series

Common type of graphic where the x axis is units of time.

New York City weather for 1980

Notice the level of detail:

  • actually pick out temperatures for each day
  • compare to humidity

the overall view:

  • trends
  • abberations

New York City Weather 1980


Parallel time-series

Parallel time-series of three separate measures attempting to show relationships: Playfair's graphic of wheat prices, wages and reigns of royalty.

Playfair's comparison of royalty, wages and prices

 


Narrative Graphics of Space and Time

Minard's map of Napoleon's army march to Russia, again as the classic example

minards map

Of course interactive 3D graphics would be appropriate.

http://www.arcgis.com/apps/CEWebViewer/viewer.html?3dWebScene=2b48caaabd0e44028724c5f109f3de97

 


Aesthetics summary

Attractive displays:

Consider collecting together lots of little statistics and breakdowns into a supertable.

Organize numerically laden text into tables and attempt to use graphics when the summary and revealing information warrants it.


The friendly graphic

Friendly Unfriendly
words spelled out, no mysterious encodings many abbreviations requiring the reader to decode them
words run left to right as normal words run vertically or several directions
little messages to help explain data (use terse phrases) graphic requires repeated reference to scattered text in some narrative at some distance from the graphic
labels are on the graphic eliminating a separate legend-- a legend pattern that follows a logical pattern obscure codings require consulting legend repeatedly, e.g. elaborate, encoding shadings, crosshatching and color codes
graphic attracts viewer, provokes curiosity, every visual characteristic has meaning graphic is full of chartjunk
colors are chosen so that those with color blindness can make sense of the graphic (blue is best) design insensitive to color blindness (red and green)
typefont is clear using upper and lower case with serif all caps, sans serif

Some final suggestions: