Tuesday, January 17, 2012

How to create a histogram in OpenOffice.org

The recipe below has been tested in OpenOffice 3.0.1.
It is unclear (though expected) that it will work also in newer revisions.

The issue

Suppose you have a set of data, time data in my example, each representing when a certain event has happened. But rather having to digest the detail data you're only interested in high level information like how often did the measured event occur in an hour.

Here's the example data:
So you got a number of events at certain times, unsorted, uncounted.

Assume that these data are in column A in your spreadsheet, maybe labeled Time in the first row.

The goal

A histogram which depicts the frequency of the events per hour like this

The resulting histogram shows
3 events before 13:00
1 event between 13:00 and 14:00
2 events between 14:00 and 15:00
3 events after 15:00

How to get there

Identify the bins

In the example above identifying the bins for the histogram is rather easy: you pick full hours. It is also rather easy to find the minimum and maximum hours.
When your list of data is very long you might not easily see the minimum nor maximum nor might it be obvious how to set the bin intervals, a little trial and error is necessary to get there.

So looking at the data all events are later than 12:00 and none is beyond 16:00, therefore I'm choosing these bins:

I'm entering the bins into column B so that the spreadsheet looks like this now:

Calculate the frequencies

StarOffice contains a FREQUENCY function which takes two arrays as input and also returns an array of results (maybe something one has to get used to. The example will make it clear how to use it.
  • Enter Ticks (or any other describing string) into cell C1
  • Click into cell C2 and click on the functions icon.
  • Out of the list of functions select FREQUENCY
  • Enter your data range and your bin range into the resp. parameter fields so that it looks like this:

The spreadsheet should now look like this:

Create the chart

  • Mark columns B and C (click on B and drag towards C so that both are highlighted)
  • Insert -> Chart...
  • Step 1: leave the chart type at Column
  • Step 2: Data series in columns and First row as label should be ticked, additionally tick also First column as label
  • Step 3: simply click Next
  • Step 4: enter describing strings e.g. Histogram into Title, hours into X axis, frequency into Y axis

You're done.