Before looking at charts lets first look at what we have. In a very general sense we have

The data points will be distributed into the bins according to some given criteria and the number of entries per bin (called

**frequency**) will be counted.

While this sounds very abstract a couple of examples will explain this.

- Example 1: a
**numerical**data series e.g. some kind of measurements like 1.5, 4.03, 2.6 etc. and the bins are**disjunct intervals**e.g. 0-2,2-4, etc. and the distribution is simply according to mathematical comparison greater or less. - Example 2: an
**ordinary**series e.g. the list of grades of a class A,C,A,F,B,B,C,A,A etc. and the bins are the grades A,B,C,D,F and the criteria is simple enough.

The **histogram** is then the graphical display of **bin values vs. frequency** as adjacent bars.

Since Google charts can do column charts we are halfway there. What is missing is - and what I did - is to **generate the two-dimensional array** needed as an input for the Google column chart.

My example covers only the simple numerical case. I start with a data series and some bins and of course a title which should explain what actually has been measured.

var series = [ 1, 3, 5, 7, 2.5, 3.1, 0.45, 5.1, 8.3, 4, 5.11, 3.9 ]; var seriesTitle = "Length"; var bins = [ 2, 4, 6 ]; // There should be one for each bin plus an extra for larger values var binTitles = ['0-2','2-4','4-6','more'];The bins should be interpreted as the

**endpoints of interval**s i.e. everything up to and including 2, from 2 to 4, from 4 to 6. If there are values larger than 6 an unnamed bin will be added. The histogram looks like this:

Starting with these data a two-dimensional array called histo will be created `var histo = new Array();`

which will eventually look like this:

[ [ 'Length', 'freq' ] [ '0-2', 2 ] [ '2-4', 5 ] [ '4-6', 3 ] [ 'more', 2 ] ]

First there is a function to initialize the histo array

function initHisto(title,bins) { // header line histo.push([]); histo[0][0] = title; histo[0][1] = "freq"; // create one row for each bin for(b=0; b<binTitles.length; b++ ) { // Create new row histo.push([]); histo[b+1][0] = ""+binTitles[b]; histo[b+1][1] = 0; } } initHisto( seriesTitle, bins );The following function called

**frequency**counts the entries per bin and puts it into the corresponding histo cell.

var maxFreq = 0; // Necessary to set the maximum y-value function frequency( series, bins ) { for(d=0; d<series.length; d++ ) { // first bin if( series[d]<=bins[0] ) { histo[1][1]++; continue; } // last unnamed bin if( bins[bins.length-1]<series[d] ) { histo[bins.length+1][1]++; continue; } // any bin in between for(b=0; b<bins.length-1; b++ ) { if( bins[b]<series[d] && series[d]<=bins[b+1] ) { histo[b+2][1]++; } } } for(h=1; h<histo.length; h++ ) { if( maxFreq<histo[h][1] ) { maxFreq = histo[h][1]; } } } frequency( series, bins );

Now that the histo array has been constructed it can be fed to the Google charts like `google.visualization.arrayToDataTable( histo );`

.
The chart needs some histogram specific tweaking which I'll explain.

function drawChart1() { var data = google.visualization.arrayToDataTable( histo ); var numGrids; // if maxFreq is odd we make it even if( maxFreq%2 == 1 ) { maxFreq++; } // the grid lines should be every even number numGrids = maxFreq/2 +1; var options = { title: 'Histogram', legend: { position: 'none' }, // no legend bar: { groupWidth: '99%' }, // in order to increase the thickness of the bars with a little space in between vAxis: { title: histo[0][1], minValue: 0, maxValue: maxFreq, gridlines: { count: numGrids } }, hAxis: { title: histo[0][0] }, backgroundColor: {strokeWidth: 2 }, // to get a nice box }; var chart = new google.visualization.ColumnChart(document.getElementById('chart_divH')); chart.draw(data, options); }

I put all of the above into one section enclosed by <script>..</script> tags but it could be separated and the histo calculation can be done separately.

Unfortunately the chart options are not quite independent of the data.
E.g. the *number of grid lines* needs to change and be made smaller for higher frequencies in order to display nicely, the *groupWidth* needs to be made bigger if more bins are displayed in order to see a little distinction between the bars and probably also depending on the final width and height of the chart.
The *width of the chart* needs to increase if a larger number of bins should be displayed nicely.

This will display the chart in the HTML body part.

<div id="chart_divH" style="width: 300px; height: 300px;"></div>

If you want to use other types of data series you need to change the frequency function and instead of mathematical greater/less comparisons you need to write the appropriate code for your case.
The given 'grades' example could be something like `if( series[d]==bins[b] ) { histo[b+1][1]++; }`

Thanks! Solved a problem..

ReplyDelete