Thursday, December 16, 2010

Gnuplot - Stacked Histograms

Since gnuplot cannot generate pie charts an alternative are stacked histograms.
In fact stacked histograms are even better in my mind since one can put the histograms next to each other and this allows better comparability than looking at a number of pie charts.
A single pie chart might make sense but in reality it's more often about how does the current chart compare to a previous one.

Here I present an easy example how to generate stacked histograms (available in gnuplot since version 4.1).
For fancier examples go to the Gnuplot histogram demos.

Consider this example (call it stackedhisto.dat):
year foo bar rest
1900 20 10 20
2000 20 30 10
2100 20 10 10
We have 1 row with header information and 3 rows of data.
For each year we have measured 3 values foo, bar and rest which we want to show in graphs in two different ways.

The first graph shows the stacked histogram with the nominal values of the data i.e. the height of the first bar is 50 (=20+10+20).

The second graph shows the percentage distribution i.e. all values are scaled to 100.
The same nominal '20' in graph 1 leads to percentages 40, 33.3 and 50 in graph 2.
One box of this type of graph is often depicted as a pie chart so rather than comparing 3 pie charts (one for each year) here we have 3 boxes in one graph, much easier to compare.

The gnuplot code

# Stacked histograms
set term png size 300,300
set output 'stackedhisto.png'
set title "Stacked histogram\nTotals"

# Where to put the legend
# and what it should contain
set key invert reverse Left outside
set key autotitle columnheader

set yrange [0:100]
set ylabel "total"

# Define plot style 'stacked histogram'
# with additional settings
set style data histogram
set style histogram rowstacked
set style fill solid border -1
set boxwidth 0.75

# We are plotting columns 2, 3 and 4 as y-values,
# the x-ticks are coming from column 1
plot 'stackedhisto.dat' using 2:xtic(1) \
    ,'' using 3 \
    ,'' using 4

# New graph
# We keep the settings from above except:
set output 'stackedhisto1.png'
set title "Stacked histogram\n% totals"
set ylabel "% of total"

# We are plotting columns 2, 3 and 4 as y-values,
# the x-ticks are coming from column 1
# Additionally to the graph above we need to specify
# the titles via 't 2' aso.
plot 'stackedhisto.dat' using (100*$2/($2+$3+$4)):xtic(1) t 2\
    ,'' using (100*$3/($2+$3+$4)) t 3\
    ,'' using (100*$4/($2+$3+$4)) t 4

The generated graphs


  1. Harvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

    data science training in bangalore

  2. myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

    aws training in bangalore

  3. Big Data and Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.

    hadoop training in bangalore