Before looking at charts lets first look at what we have. In a very general sense we have

The data points will be distributed into the bins according to some given criteria and the number of entries per bin (called

**frequency**) will be counted.

While this sounds very abstract a couple of examples will explain this.

- Example 1: a
**numerical**data series e.g. some kind of measurements like 1.5, 4.03, 2.6 etc. and the bins are**disjunct intervals**e.g. 0-2,2-4, etc. and the distribution is simply according to mathematical comparison greater or less. - Example 2: an
**ordinary**series e.g. the list of grades of a class A,C,A,F,B,B,C,A,A etc. and the bins are the grades A,B,C,D,F and the criteria is simple enough.

The **histogram** is then the graphical display of **bin values vs. frequency** as adjacent bars.

Since Google charts can do column charts we are halfway there. What is missing is - and what I did - is to **generate the two-dimensional array** needed as an input for the Google column chart.

My example covers only the simple numerical case. I start with a data series and some bins and of course a title which should explain what actually has been measured.

var series = [ 1, 3, 5, 7, 2.5, 3.1, 0.45, 5.1, 8.3, 4, 5.11, 3.9 ]; var seriesTitle = "Length"; var bins = [ 2, 4, 6 ]; // There should be one for each bin plus an extra for larger values var binTitles = ['0-2','2-4','4-6','more'];The bins should be interpreted as the

**endpoints of interval**s i.e. everything up to and including 2, from 2 to 4, from 4 to 6. If there are values larger than 6 an unnamed bin will be added. The histogram looks like this:

Starting with these data a two-dimensional array called histo will be created `var histo = new Array();`

which will eventually look like this:

[ [ 'Length', 'freq' ] [ '0-2', 2 ] [ '2-4', 5 ] [ '4-6', 3 ] [ 'more', 2 ] ]

First there is a function to initialize the histo array

function initHisto(title,bins) { // header line histo.push([]); histo[0][0] = title; histo[0][1] = "freq"; // create one row for each bin for(b=0; b<binTitles.length; b++ ) { // Create new row histo.push([]); histo[b+1][0] = ""+binTitles[b]; histo[b+1][1] = 0; } } initHisto( seriesTitle, bins );The following function called

**frequency**counts the entries per bin and puts it into the corresponding histo cell.

var maxFreq = 0; // Necessary to set the maximum y-value function frequency( series, bins ) { for(d=0; d<series.length; d++ ) { // first bin if( series[d]<=bins[0] ) { histo[1][1]++; continue; } // last unnamed bin if( bins[bins.length-1]<series[d] ) { histo[bins.length+1][1]++; continue; } // any bin in between for(b=0; b<bins.length-1; b++ ) { if( bins[b]<series[d] && series[d]<=bins[b+1] ) { histo[b+2][1]++; } } } for(h=1; h<histo.length; h++ ) { if( maxFreq<histo[h][1] ) { maxFreq = histo[h][1]; } } } frequency( series, bins );

Now that the histo array has been constructed it can be fed to the Google charts like `google.visualization.arrayToDataTable( histo );`

.
The chart needs some histogram specific tweaking which I'll explain.

function drawChart1() { var data = google.visualization.arrayToDataTable( histo ); var numGrids; // if maxFreq is odd we make it even if( maxFreq%2 == 1 ) { maxFreq++; } // the grid lines should be every even number numGrids = maxFreq/2 +1; var options = { title: 'Histogram', legend: { position: 'none' }, // no legend bar: { groupWidth: '99%' }, // in order to increase the thickness of the bars with a little space in between vAxis: { title: histo[0][1], minValue: 0, maxValue: maxFreq, gridlines: { count: numGrids } }, hAxis: { title: histo[0][0] }, backgroundColor: {strokeWidth: 2 }, // to get a nice box }; var chart = new google.visualization.ColumnChart(document.getElementById('chart_divH')); chart.draw(data, options); }

I put all of the above into one section enclosed by <script>..</script> tags but it could be separated and the histo calculation can be done separately.

Unfortunately the chart options are not quite independent of the data.
E.g. the *number of grid lines* needs to change and be made smaller for higher frequencies in order to display nicely, the *groupWidth* needs to be made bigger if more bins are displayed in order to see a little distinction between the bars and probably also depending on the final width and height of the chart.
The *width of the chart* needs to increase if a larger number of bins should be displayed nicely.

This will display the chart in the HTML body part.

<div id="chart_divH" style="width: 300px; height: 300px;"></div>

If you want to use other types of data series you need to change the frequency function and instead of mathematical greater/less comparisons you need to write the appropriate code for your case.
The given 'grades' example could be something like `if( series[d]==bins[b] ) { histo[b+1][1]++; }`

Thanks! Solved a problem..

ReplyDeleteHarvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

ReplyDeletedata science training in bangalore

myTectra offers Big Data and Hadoop training in Bangalore using Class Room.

ReplyDeletemyTectra offers Live Online Big Data and Hadoop training Globally.

Big Data and Hadoop training Unlike traditional systems, Big Data and Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware.myTectra Big Data and Hadoop training is designed to help you become a expert Hadoop developer. myTectra offers Big Data Hadoop Training in Bangalore using Class Room. myTectra offers Live Online Big Data and Hadoop training Globally.

hadoop training in bangalore

Python has adopted as a language of choice for almost all the domain in IT including the most trending technologies such as Artificial Intelligence, Machine Learning, Data Science, Internet of Things (IoT), Cloud Computing technologies such as AWS, OpenStack, VMware, Google Cloud, etc.., Big Data Analytics, DevOps and Python is prepared language in traditional IT domain such as Web Application Development, Infrastructure Automation ,Software Testing, Mobile Testing.

ReplyDeletepython online training