Thursday, January 31, 2013

Pareto charts with Google charts

As a follow-up to my recent post about histograms in Google charts I thought it would be nice to have a Pareto chart too.

I often use Pareto charts only in the very simple sense of showing bars in descending order but if one wants to include the ascending percentage line there are two challenges here:

  • for a given data set one needs to calculate the accumulated percentages
  • the chart needs to show two graphs (columns and line) and two axes (left for bars, right for percentages)

    The first challenge needs to be coded, the second can be done with so called Combo Charts.

    Assume we have the following data represented as a 2-dimensional array in Javascript, not necessarily ordered since they will be reordered anyway. The column titles are kept in a separate array with an additional title for the percentages.

    var dataSet1 = [
        [ 'A', 25 ],
        [ 'B', 125 ],
        [ 'C', 35 ],
        [ 'D', 10 ],
        [ 'E', 12 ],
        [ 'F', 70 ],
        [ 'G', 60 ]
        ];
    
    var dataTitle = [ 'Category', 'Size', 'Pctg' ];
    

    The Pareto chart looks like this:

    How did I get there?

    There is a function which

  • first sorts the array according to column 2 in descending order (that is done by supplying the JavaScript built-in 'sort' with a sorting function of our own)
  • then calculates the total of column, calculates the percentages and puts the percentages into a new column in each row
  • prepends the data array with its title row

    function paretorize() {
      // Sort the dataSet array usung column 2
      dataSet1.sort( function(a,b) {
        return b[1] - a[1];
      });
    
      // Calculate the total of column 2
      var sum = 0;
      for(row=0; row<dataSet1.length; row++) {
        sum += dataSet1[row][1];
      }
    
      // Calculate the accumulating percentages
      // and add them into a new column in each row
      var accum = 0;
      for(row=0; row<dataSet1.length; row++) {
        dataSet1[row].push( accum+100*dataSet1[row][1]/sum );
        accum = dataSet1[row][2];
      }
    
      // Add the title row at the beginning of dataSet
      // ('unshift' is not supported in IE8 and earlier)
      dataSet1.unshift( dataTitle );
    }
    
    paretorize();
    

    Now that the dataSet array has been constructed it can be fed to the Google charts and a few options need to be set to take care of the second graph and axis. This is achieved by making vAxes an array of two axes with their own titles (and other attributes if needed) and by setting seriesType and series to specify standard behaviour and the special setting for the line chart.

    function drawChart() {
      var data = google.visualization.arrayToDataTable( dataSet1 );
    
      var options = {
        title:  'Pareto chart',
        legend: { position: 'none' },         // no legend
        // Create two vertical axes taking its titles from the first row
        vAxes:[
           { title: dataSet1[0][1], minValue: 0 }, 
           { title: dataSet1[0][2], minValue: 0, maxValue: 100 }
         ],
        hAxis:  { title: dataSet1[0][0] },
        backgroundColor: {strokeWidth: 2 },   // to get a nice box
        seriesType: "bars",                   // the standard chart type
        // the second data column should be of type 'line' and should be associated with the second vertical axis
        series: {1: {type: "line", targetAxisIndex: 1 }},  
        };
    
      // Note: this calls a ComboChart !!!
      var chart = new google.visualization.ComboChart(document.getElementById('chart_div'));
      chart.draw(data, options);
    }
    

    The same comments apply about chart width, number of data rows, size of columns as in the histogram blog.

    Putting all together

    The code snippets above are part of a Javascript script and need to be put into an HTML page as follows in order to display the chart.
    <script type="text/javascript" src="https://www.google.com/jsapi"></script>
    <script type="text/javascript">
    google.load("visualization", "1", {packages:["corechart"]});
    google.setOnLoadCallback(drawChart);
    
    ... put here all three code snippets from above ...
    
    </script>
    <div id="chart_div" style="width: 400px; height: 300px;"></div>
    

    One thing to note here: if you create several charts on one HTML page each of them needs to have its own id, function and needs to be set separately like

    google.setOnLoadCallback(drawChartA);
    google.setOnLoadCallback(drawChartB);
    google.setOnLoadCallback(drawChartC);
    ...
    
    function drawChartA() {
    ...
      // ComboChart or any other chart type of course
      var chart = new google.visualization.ComboChart(document.getElementById('chart_divA'));
    ...
    }
    
    function drawChartB() {
    ...
      var chart = new google.visualization.ComboChart(document.getElementById('chart_divB'));
    ...
    }
    
    function drawChartC() {
    ...
      var chart = new google.visualization.ComboChart(document.getElementById('chart_divC'));
    ...
    }
    
    and later place the chart wherever needed
    <div id="chart_divA" style="width: 400px; height: 300px;"></div>
    ...
    <div id="chart_divB" style="width: 400px; height: 300px;"></div>
    ...
    <div id="chart_divC" style="width: 400px; height: 300px;"></div>
    
  • Friday, January 25, 2013

    Histograms with Google charts

    It seems that today (Jan 2013) there is no histogram chart in Google charts so continuing my experimenting with column charts in the previous post I decided to look at this particular problem.

    Before looking at charts lets first look at what we have. In a very general sense we have

  • a series of data
  • a list of bins.
    The data points will be distributed into the bins according to some given criteria and the number of entries per bin (called frequency) will be counted.

    While this sounds very abstract a couple of examples will explain this.

    • Example 1: a numerical data series e.g. some kind of measurements like 1.5, 4.03, 2.6 etc. and the bins are disjunct intervals e.g. 0-2,2-4, etc. and the distribution is simply according to mathematical comparison greater or less.
    • Example 2: an ordinary series e.g. the list of grades of a class A,C,A,F,B,B,C,A,A etc. and the bins are the grades A,B,C,D,F and the criteria is simple enough.

    The histogram is then the graphical display of bin values vs. frequency as adjacent bars.

    Since Google charts can do column charts we are halfway there. What is missing is - and what I did - is to generate the two-dimensional array needed as an input for the Google column chart.

    My example covers only the simple numerical case. I start with a data series and some bins and of course a title which should explain what actually has been measured.

    var series = [   1, 3, 5, 7, 2.5, 3.1, 0.45, 5.1, 8.3, 4, 5.11, 3.9 ];
    var seriesTitle = "Length";
    
    var bins      = [   2,    4,    6 ];
    // There should be one for each bin plus an extra for larger values
    var binTitles = ['0-2','2-4','4-6','more'];  
    
    
    The bins should be interpreted as the endpoints of intervals i.e. everything up to and including 2, from 2 to 4, from 4 to 6. If there are values larger than 6 an unnamed bin will be added. The histogram looks like this:

    Starting with these data a two-dimensional array called histo will be created var histo = new Array(); which will eventually look like this:

    [ [ 'Length', 'freq' ]
      [ '0-2',  2 ]
      [ '2-4',  5 ]
      [ '4-6',  3 ]
      [ 'more', 2 ] ]
    

    First there is a function to initialize the histo array

    function initHisto(title,bins) {
      // header line
      histo.push([]);
      histo[0][0]   = title;
      histo[0][1]   = "freq";
    
      // create one row for each bin
      for(b=0; b<binTitles.length; b++ ) {
        // Create new row
        histo.push([]);
        histo[b+1][0]       = ""+binTitles[b];
        histo[b+1][1]       = 0;
      }
    }
    
    initHisto( seriesTitle, bins );
    
    The following function called frequency counts the entries per bin and puts it into the corresponding histo cell.
    var maxFreq     = 0;    // Necessary to set the maximum y-value
    
    function frequency( series, bins ) {
      for(d=0; d<series.length; d++ ) {
        // first bin
        if( series[d]<=bins[0] ) {
            histo[1][1]++;
            continue;
        }
        // last unnamed bin
        if( bins[bins.length-1]<series[d] ) {
            histo[bins.length+1][1]++;
            continue;
        }
        // any bin in between
        for(b=0; b<bins.length-1; b++ ) {
           if( bins[b]<series[d] && series[d]<=bins[b+1] ) {
              histo[b+2][1]++;
           }
        }
      }
    
      for(h=1; h<histo.length; h++ ) {
        if( maxFreq<histo[h][1] ) {
          maxFreq   = histo[h][1];
        }
      }
    }
    
    frequency( series, bins );
    

    Now that the histo array has been constructed it can be fed to the Google charts like google.visualization.arrayToDataTable( histo );. The chart needs some histogram specific tweaking which I'll explain.

    function drawChart1() {
      var data = google.visualization.arrayToDataTable( histo );
    
      var numGrids;
      //  if maxFreq is odd we make it even
      if( maxFreq%2 == 1 ) {
        maxFreq++;
      }
      //  the grid lines should be every even number
      numGrids        = maxFreq/2 +1;
    
      var options = {
        title:  'Histogram',
        legend: { position: 'none' },     // no legend
        bar:    { groupWidth: '99%' },    // in order to increase the thickness of the bars with a little space in between
        vAxis:  { title: histo[0][1], minValue: 0, maxValue: maxFreq, gridlines: { count: numGrids } },
        hAxis:  { title: histo[0][0] },
        backgroundColor: {strokeWidth: 2 },   // to get a nice box
      };
    
      var chart = new google.visualization.ColumnChart(document.getElementById('chart_divH'));
      chart.draw(data, options);
    }
    

    I put all of the above into one section enclosed by <script>..</script> tags but it could be separated and the histo calculation can be done separately.
    Unfortunately the chart options are not quite independent of the data. E.g. the number of grid lines needs to change and be made smaller for higher frequencies in order to display nicely, the groupWidth needs to be made bigger if more bins are displayed in order to see a little distinction between the bars and probably also depending on the final width and height of the chart. The width of the chart needs to increase if a larger number of bins should be displayed nicely.

    This will display the chart in the HTML body part.

    <div id="chart_divH" style="width: 300px; height: 300px;"></div>
    

    If you want to use other types of data series you need to change the frequency function and instead of mathematical greater/less comparisons you need to write the appropriate code for your case. The given 'grades' example could be something like if( series[d]==bins[b] ) { histo[b+1][1]++; }

  • Thursday, January 24, 2013

    Stacked column charts with Google charts (compared to gnuplot)

    Some time ago I had written about stacked histograms in gnuplot.

    Today I wanted to see how to achieve the same result with Google charts.

    For the purpose of this first exercise I was using the client side only and didn't worry about where the data should or could come from (in gnuplot data came from a file, naturally) but I put them directly into the JavaScript code.

    So first take a look at the result.

    As you can see the charts are pretty similar to gnuplot. There is currently one caveat in Google charts where you can't have multiline titles so it's displayed as one line.


    So here is the JavaScript code.
    The code is a slight modification of the Google charts column chart example and of course the calculation of the percentages for the second chart is the only piece where some real additional work is required.
    I left all settings at their defaults (like colours, fonts etc.) except for the obvious y-axis title and height, chart title, thickness of chart border (in order to get a nice box) and the most important isStacked setting.

    <script type="text/javascript" src="https://www.google.com/jsapi"></script>
    <script type="text/javascript">
      google.load("visualization", "1", {packages:["corechart"]});
     
      var dataSet = [
            ['year', 'foo', 'bar', 'rest'],
            ['1900', 20, 10, 20],
            ['2000', 20, 30, 10],
            ['2100', 20, 10, 10]
            ];
    
      // The first chart
    
      google.setOnLoadCallback(drawChart1); 
      function drawChart1() {
        var data = google.visualization.arrayToDataTable( dataSet );
    
        var options = {
          title: 'Stacked histogram\nTotals',
          vAxis: {title: 'total', maxValue: 100},  // sets the maximum value
          backgroundColor: {strokeWidth: 2 },  // to draw a nice box all around the chart
          isStacked: 'true'                    //  = rowstacked in gnuplot
        };
    
        var chart = new google.visualization.ColumnChart(document.getElementById('chart_div1'));
        chart.draw(data, options);
      }
    
      // The second chart
    
      var dataSet2    = dataSet;
      google.setOnLoadCallback(drawChart2);
      function drawChart2() {
        // Calculate the percentages
        var sum = new Array();
        for(row=1; row<dataSet.length; row++) {
          sum[row]        = 0;
          for(col=1; col<dataSet[row].length; col++) {
            sum[row]        += dataSet[row][col];
          }
          for(col=1; col<dataSet[row].length; col++) {
            dataSet2[row][col]      = 100*dataSet[row][col]/sum[row];
          }
        }
    
        var data = google.visualization.arrayToDataTable(dataSet2);
    
        var options = {
          title: 'Stacked histogram\n% Totals',
          vAxis: {title: 'total', maxValue: 100},  // sets the maximum value
          backgroundColor: {strokeWidth: 2 },  // to draw a nice box all around the chart
          isStacked: 'true'                    //  = rowstacked in gnuplot
        };
    
        var chart = new   google.visualization.ColumnChart(document.getElementById('chart_div2'));
        chart.draw(data, options);
      }
    </script>
    
    <div style="display: table;">
      <div style="display: table-row">
        <div id="chart_div1" style="width: 300px; height: 300px; display: table-cell;"></div>
        <div id="chart_div2" style="width: 300px; height: 300px; display: table-cell;"></div>
      </div>
    </div>
    

    Not having worked with Google charts before I did develop this in a couple of hours which shows that Google charts work well and can be learned pretty fast given that you have an idea about charts and its terminology.

    Why Google termed their axes horizontal and vertical rather than using the mathematical standards x and y is the secret of their product team though.

    There is some discussion whether and how to use Google charts (data privacy etc.) since it seems that data are sent to and processed by Google servers (or servers of other chart type providers) but that is beyond the scope of this blog.

    The next step will be to look into the server side piece of Google charts i.e. writing a data provider.

    Wednesday, January 23, 2013

    Some Sieve rules which I used in the past

    Here is a list of Sieve rules which I used in the past.
    Some people do not see a use for managing your incoming email automatically but I always found it useful, I guess it depends on your style of working.

    At Sun Microsystems we used to run email on Sun's own products (Sun Java System Messaging Server) and there was a web interface to create Sieve rules since the normal interface did not provide enough flexibility sometimes.


    The examples below use one important feature: each rule ends with a 'stop;' which means that if the rule is applicable and the action executed then no further filter rule parsing is applied. The email will be moved to the target folder.
    If you want the capability to duplicate emails in different folders then simply omit the 'stop;' and the email will be passed to the next filter.

    Filter on recipient (To, CC, ...)

    I used this to filter emails going to maillists like '...-interest'.

    recipient: abc-interest
    target folder: abc-stuff

    # RULE: email to maillist abc-interest
    require "fileinto";
    if anyof(header :contains ["To","Cc","Bcc","Resent-to","Resent-cc","Resent-bcc"] "abc-interest"){
    fileinto "abc-stuff";
    stop;
    }
    

    Filter on sender

    I used this to filter emails from my manager. This works of course only in this simple way if your managers last name is fairly unique.

    sender: Beard
    target folder: FromLarry

    #RULE: email from manager
    require "fileinto";
    if anyof(header :contains ["From","Sender","Resent-from","Resent-sender","Return-path"] "Beard"){
    fileinto "FromLarry";
    stop;
    }
    

    Filter on subject

    I used this to filter emails which would occur from time to time (or regularily) sent from some automated script. Note that it uses '*' for pattern matching.

    subject: rsync *
    target folder: rsync

    #RULE: filter all emails from the rsync daemon
    require "fileinto";
    if anyof(header :matches ["Subject","Comments","Keywords"] "rsync *"){
    fileinto "rsync";
    stop;
    }
    

    A rule to reject large emails

    Since I hate large email attachments - and even hated them more years ago when there were more restrictions on capacity (I started with a free email account of 10MB) and performance - so the reason for this rule.
    I wanted to remind people nicely to reconsider their email behaviour.
    In fact the rule is set so that the emails do arrive and are not rejected (notice the missing 'stop;')

    #RULE: Reject_large_emails
    require "vacation";
    if size :over 2M {
       vacation "Dear sender,
    I'm sorry but I do not accept email over 2MB in size.
    Please upload larger files to a server and send me a link.
    
    Thanks.
    Andreas";
    }
    

    Filter on a combination of subject and sender

    This is a refinement of the 'subject' rule above. If you have e.g. cron jobs on some servers which regularily (or better only in case of errors) send emails to alert staff then this filter comes in quite handy (assuming that the sender of the email is e.g. some kind of specially named admin account).

    sender: canary
    subject: rsync *
    target folder: ARCHIVE/ITSM/rsync

    #RULE: filter on sender 'canary' and subject 'rsync'
    require "fileinto";
    if anyof(header :contains ["From","Sender","Resent-from","Resent-sender","Return-path"] "canary",
    header :matches ["Subject","Comments","Keywords"] "rsync *"){ 
    fileinto "rsync";
    }
    

    Of course none of these rules is perfect in the sense that it won't pick a wrong email occasionally but they served me well in the past.
    For example since I was based in Europe and my managers were in the US first thing in the morning was to check the folder with emails from my manager (assuming that managers emails have a certain priority which of course anyone can question :-) )


    If a mail server supports the ManageSieve protocol one can use email clients (e.g. the Sieve add-on for Thunderbird) or command line tools to create Sieve filters.

    Tuesday, January 22, 2013

    My first PHP extension: fetching synonyms from WordNet

    This week I developed my first extension of PHP: an interface to fetch synonyms of words from the WordNet database.

    Versions: PHP 5.2.15, WordNet 3.0 , MacOS 10.5.8

    Summary: after downloading and compiling WordNet on my Mac I followed the instructions (found on some tutorial web pages) how to build a PHP shared library and finally wrote a little PHP script to invoke the shared library.

    Installing WordNet

  • Download here (version 3.0 for UNIX systems)
  • Compile with the option to install in a user directory (I always do this rather than changing the system setup)
    ./configure --prefix=/Users/testuser/WordNet
    make
    make install
    
  • Testing
    export WNHOME=/Users/testuser/WordNet
    $WNHOME/bin/wn application -synsn
    
    will return the following. It basically means that the word 'application' has 7 meanings ('senses' in the terminology of WordNet) and it lists possible synonyms for each meaning.
    Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun application
    
    7 senses of application                                                 
    
    Sense 1
    application, practical application
           => use, usage, utilization, utilisation, employment, exercise
    
    Sense 2
    application
           => request, petition, postulation
    
    Sense 3
    application, coating, covering
           => manual labor, manual labour
    
    Sense 4
    application, application program, applications programme
           => program, programme, computer program, computer programme
    
    Sense 5
    lotion, application
           => remedy, curative, cure, therapeutic
    
    Sense 6
    application, diligence
           => effort, elbow grease, exertion, travail, sweat
    
    Sense 7
    application
           => action
    

    Note: the original idea was to create a program to find synonyms for words in order to have variations when writing a text, preferably an automated word replacement (as there are a number of such websites available on the internet) but as you can see already in this example without context it is rather likely to pick the wrong meaning of a word and thus substitute it by something wrong (with probably funny results). So I never wrote this program in the end.

    Create PHP shared library

    In order to create the PHP shared library I was following the instructions on this website.

  • I took the example config.m4, .h and C file, replaced all hello/HELLO occurances by wordnet/WORDNET respectively and saved them (as php_wordnet.h and wordnet.c). I took a copy of these 3 files since they needed to be the fallback version in case of configure or make issues.
    I then ran
    phpize
    ./configure --enable-wordnet
    make
    
    in order to see whether I could get the basic example compiled.
    In case of errors I cleared the working dir, took a copy of the saved files, made the appropriate changes and ran the 3 steps again until I got a working set of files (which eventually became my fallback version for the next step).
  • I then needed to get WordNet into my files. I had decided that I wanted to provide only one function which should return synonyms of a given word.
    • php_wordnet.h: no change necessary
    • config.m4: this required a complete rewrite in order to achieve that configure would recognize a --with-wordnet=/Users/testuser/WordNet option and that the WordNet include file wn.h and the library libWN.a would be properly recognized. During this process I learned a lot about config.m4 (and eventually wrote a config.m4 generator on the side) and here is the working file.
      dnl This is the config.m4 file for extension WordNet (wordnet)
      
      dnl The extension should be built with   --with-wordnet=DIR
      PHP_ARG_WITH( wordnet, whether to enable WordNet support,
      [ --with-wordnet=[DIR]   Enable WordNet support])
      
      dnl Check whether the extension is enabled at all
      if test "$PHP_WORDNET" != "no"; then
      
        AC_DEFINE(HAVE_WORDNET, 1, [Whether you have WordNet])
      
        dnl Add include path
        PHP_ADD_INCLUDE($PHP_WORDNET/include)
        
        dnl Check whether the lib exists in the lib directory
        AC_MSG_CHECKING(for libWN.a in $PHP_WORDNET and /usr/local /usr )
        for i in $PHP_WORDNET /usr/local /usr; do
            if test -r $i/lib/libWN.a; then
              WORDNET_DIR=$i
              AC_MSG_RESULT(found in $i)
            fi
        done
      
        if test -z "$WORDNET_DIR"; then
          AC_MSG_RESULT(not found)
          AC_MSG_ERROR(Please reinstall the libWN.a distribution - includes should be
                       in /include and libWN.a should be in /lib)
        fi
      
        dnl Add lib path
        PHP_ADD_LIBRARY_WITH_PATH( WN, $PHP_WORDNET/lib, WORDNET_SHARED_LIBADD)
        AC_DEFINE( HAVE_WORDNETLIB, 1, [Whether WordNet support is present and requested])
        
        dnl Finally, tell the build system about the extension and what files are needed
        PHP_SUBST(WORDNET_SHARED_LIBADD)
        PHP_NEW_EXTENSION( wordnet, wordnet.c, $ext_shared)
      fi
      
    • wordnet.c: basically looking in depth into the C-code of WordNet gave me the final idea where to hook in with my PHP shared library which consists of one PHP function and one helper function (which does all the work and is mimicking the original WordNet code).
      #ifdef HAVE_CONFIG_H
      #include "config.h"
      #endif
      
      #include "php.h"
      #include "php_wordnet.h"
      #include "wn.h"
      
      static function_entry wordnet_functions[] = {
          PHP_FE(synsn, NULL)
          {NULL, NULL, NULL}
      };
      
      zend_module_entry wordnet_module_entry = {
      #if ZEND_MODULE_API_NO >= 20010901
          STANDARD_MODULE_HEADER,
      #endif
          PHP_WORDNET_EXTNAME,
          wordnet_functions,
          NULL,
          NULL,
          NULL,
          NULL,
          NULL,
      #if ZEND_MODULE_API_NO >= 20010901
          PHP_WORDNET_VERSION,
      #endif
          STANDARD_MODULE_PROPERTIES
      };
      
      #ifdef COMPILE_DL_WORDNET
      ZEND_GET_MODULE(wordnet)
      #endif
      
      static char *wnsynsn(char *name);
      
      PHP_FUNCTION(synsn)
      {       
          char *name;
          int name_len;
      
          if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &name, &name_len) == FAILURE) {
              RETURN_NULL();
          }
      
          char *retstr;
          if( (retstr = wnsynsn(name))==NULL) {
              RETURN_NULL();
          }
      
          RETURN_STRING(retstr, 1);
      }
      
      char *wnsynsn(char *name) {
          if (wninit()) {             /* open database */
              php_printf("%s","wn: Fatal error - cannot open WordNet database\n");
              exit (-1);
          }
      
          dflag = wnsnsflag   = 0;
          char *outbuf  = findtheinfo(name, NOUN, HYPERPTR, ALLSENSES );
          return( strlen(outbuf)>0 ? outbuf : NULL);
      }
      

    Now I could run

    phpize
    ./configure --with-wordnet=/Users/testuser/WordNet
    make
    
    (I did not take care of creating a proper 'make install' part)

    Create a PHP test script

    This simple script loads the shared library and prints the results of the WordNet synsn request for 'application'.
    <?php
    // Loading the extension based on OS
    if (!extension_loaded('wordnet')) {
        if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
            dl('php_wordnet.dll');
        } else {
            dl('wordnet.so');
        }
    }
    
    echo synsn("application");
    ?>
    
    Run as
    WNHOME=/Users/testuser/WordNet      # necessary to detect the WordNet dictionary
    PHPDIR=the directory where the compilation took place
    php -d extension_dir=$PHPDIR/modules test.php
    

    Or if you want the output to be a bit more dense:

    $synsn   = synsn("application");
    $patterns       = array(
            '/.* senses of .*\n/',          # remove the '... senses of ...' line
            '/^Sense.*\n/m',                # remove the 'Sense ...' lines
            '/^ *\n/m'                      # remove empty lines
             );
    
    $synsn   = preg_replace( $patterns, "", $synsn);
    echo $synsn;
    
    will lead to
    application, practical application
           => use, usage, utilization, utilisation, employment, exercise
    application
           => request, petition, postulation
    application, coating, covering
           => manual labor, manual labour
    application, application program, applications programme
           => program, programme, computer program, computer programme
    lotion, application
           => remedy, curative, cure, therapeutic
    application, diligence
           => effort, elbow grease, exertion, travail, sweat
    application
           => action
    

    Of course there are other solutions to access WordNet: call the WordNet executable wn with shell_exec, use an installation of WordNet in MySQL etc. I wanted to do that specific integration via shared library.
  • Monday, January 21, 2013

    A config.m4 generator (in m4)

    When you create a PHP extension you need to provide a config.m4 file.
    There are plenty of examples how a config.m4 should look like and one can (and does) mimic these examples.

    Since they are all somewhat similar I wanted to take a step back and find a way how to automatically create a valid config.m4 by using

  • a set of definitions (name of the extension, name of C file etc.)
  • a config.m4 template

    Of course there are plenty of ways to do this and I decided to explore the m4 way i.e. writing an m4 file which after being parsed by m4 would generate the config.m4 file.

    The m4 file consists of three parts (aside from documenation).

  • The first part lists some variables which need to be changed by the user according to his needs.
    The user needs to provide:
    • The name of the extension (which will name the shared library)
    • The title of the extension (more a descriptive name for humans)
    • The C file
    • The name of the external library
      The path will be provided at configuration time via --with-PHP_EXT=DIR
  • The second part creates a set of derived variables (derived from part 1)
    These variables will replace the respective content in the template.
  • The third part is the PHP template.
    The template contains code to generate a PHP interface to an external application whose library should be linked in.

    The m4 file

    I named it preconfig.m4
    divert(-1)
    
    dnl This is a generalized config.m4 file which defines all settings at the beginning
    dnl in order to simplify the building of php extensions
    dnl Run as:
    dnl    m4 preconfig.m4 > config.m4
    dnl    phpize
    dnl    ./configure --with-PHP_EXT=DIR
    dnl    make
    dnl where PHP_EXT is the name of the extension and DIR is the directory of the external package
    dnl which should contain an include and a lib directory
    
    dnl ----------------------------------
    dnl A set of definitions for the new php extension to be built incl. the C file(s)
    dnl     1) the name of the shared library  e.g.  foo
    dnl     2) a nickname for the share library e.g. Foo bar
    dnl     3) the name of the C file e.g.           foo.c
    dnl     4) the name of the library to be linked into e.g.  bar
    dnl        (the corresponding libbar.a must be found in DIR/lib)
    dnl     Note: DIR/include should contain the corresponding .h file
    dnl           which needs to be called in foo.c
    dnl ----------------------------------
    
    dnl  The name of the php extension
    define(`PHP_EXT',`foo')
    
    dnl  A name of the extension (for humans)
    define(`EXT_NAME',`Foo bar')
    
    dnl  The corresponding C file (there should be an include file too like php_config.h)
    define(`PHP_C_FILES',`foo.c')
    
    dnl  To be linked as  -lbar  with a corresponding  libbar.a  in DIR/lib
    define(`LIBNAME',`bar')
    
    
    dnl ----------------------------------
    dnl Everything below here is derived from the settings above
    dnl ----------------------------------
    
    dnl  The name of the extension in capitals
    define(`PHP_EXT_UPP',translit( PHP_EXT, [a-z], [A-Z]))
    
    dnl  The internal php variable to define the existance of the extension
    define(`PHP_VAR',`PHP_'PHP_EXT_UPP)
    
    dnl  The directory of lib/libbar.a
    define(`PHP_DIR',PHP_EXT_UPP`_DIR')
    
    dnl  An autoconfig variable
    define(`HAVE_EXT',`HAVE_'PHP_EXT_UPP)
    
    dnl  An autoconfig variable
    define(`EXT_SHARED_ADD',PHP_EXT_UPP`_SHARED_LIBADD')
    
    dnl  An autoconfig variable
    define(`HAVE_EXTLIB',`HAVE_'PHP_EXT_UPP`LIB')
    
    dnl  The filename of the library  bar -> libbar.a
    define(`LIBS',`lib'LIBNAME`.a')
    
    dnl  The default paths for libbar.a
    define(`LIB_PATHS',`/usr/local /usr')
    
    dnl ----------------------------------
    dnl Here starts the actual config.m4 code
    dnl ----------------------------------
    divert()
    `dnl This is the config.m4 file for extension 'EXT_NAME` ('PHP_EXT`)'
    
    `dnl The extension should be built with   --with-'PHP_EXT`=DIR'
    PHP_ARG_WITH( PHP_EXT, whether to enable EXT_NAME support,
    [ --with-PHP_EXT=[DIR]   Enable EXT_NAME support])
    
    `dnl Check whether the extension is enabled at all'
    if test "$PHP_VAR" != "no"; then
    
      AC_DEFINE(HAVE_EXT, 1, [Whether you have EXT_NAME])
    
      `dnl Add include path'
      PHP_ADD_INCLUDE($PHP_VAR/include)
    
      `dnl Check whether the lib exists in the lib directory'
      AC_MSG_CHECKING(for LIBS in $PHP_VAR and LIB_PATHS )
      for i in $PHP_VAR LIB_PATHS; do
          if test -r $i/lib/LIBS; then
            PHP_DIR=$i
            AC_MSG_RESULT(found in $i)
          fi
      done
    
      if test -z "$PHP_DIR"; then
        AC_MSG_RESULT(not found)
        AC_MSG_ERROR(Please reinstall the LIBS distribution - includes should be
                     in /include and LIBS should be in /lib)
      fi
    
      `dnl Add lib path'
      PHP_ADD_LIBRARY_WITH_PATH( LIBNAME, $PHP_VAR/lib, EXT_SHARED_ADD)
      AC_DEFINE( HAVE_EXTLIB, 1, [Whether EXT_NAME support is present and requested])
    
      `dnl Finally, tell the build system about the extension and what files are needed'
      PHP_SUBST(EXT_SHARED_ADD)
      PHP_NEW_EXTENSION( PHP_EXT, PHP_C_FILES, $ext_shared)
    fi
    

    You should run

    m4 preconfig.m4 >config.m4
    
    which will report
    m4:../s1/preconfig.m4:65: empty string treated as 0 in builtin `divert'
    
    which can be ignored, it is about ommitting a set of superfluous empty lines.

    The resulting config.m4

    What you can see is that all references to PHP_VAR, PHP_EXT etc. have been replaced by the proper 'FOO' versions.

    
    dnl This is the config.m4 file for extension Foo bar (foo)
    
    dnl The extension should be built with   --with-foo=DIR
    PHP_ARG_WITH( foo, whether to enable Foo bar support,
    [ --with-foo=[DIR]   Enable Foo bar support])
    
    dnl Check whether the extension is enabled at all
    if test "$PHP_FOO" != "no"; then
    
      AC_DEFINE(HAVE_FOO, 1, [Whether you have Foo bar])
    
      dnl Add include path
      PHP_ADD_INCLUDE($PHP_FOO/include)
      
      dnl Check whether the lib exists in the lib directory
      AC_MSG_CHECKING(for libfoo.a in $PHP_FOO and /usr/local /usr )
      for i in $PHP_FOO /usr/local /usr; do
          if test -r $i/lib/libfoo.a; then
            FOO_DIR=$i
            AC_MSG_RESULT(found in $i)
          fi
      done
    
      if test -z "$FOO_DIR"; then
        AC_MSG_RESULT(not found)
        AC_MSG_ERROR(Please reinstall the libfoo.a distribution - includes should be
                     in /include and libfoo.a should be in /lib)
      fi
    
      dnl Add lib path
      PHP_ADD_LIBRARY_WITH_PATH( libfoo.a, $PHP_FOO/lib, FOO_SHARED_LIBADD)
      AC_DEFINE( HAVE_FOOLIB, 1, [Whether Foo bar support is present and requested])
      
      dnl Finally, tell the build system about the extension and what files are needed
      PHP_SUBST(FOO_SHARED_LIBADD)
      PHP_NEW_EXTENSION( foo, foo.c, $ext_shared)
    fi
    

    This file will run nicely with phpize (I tested with PHP5). This particular example will of course end with a configuration error but should run ok for a properly existing and installed library.

    ./configure --with-foo=DIR
    ...
    ...
    checking whether to enable Foo bar support... yes, shared
    checking for libfoo.a in /Users/bar and /usr/local /usr ... not found
    configure: error: Please reinstall the libfoo.a distribution - includes should be
                     in /include and libfoo.a should be in /lib
    
    

    Of course the template could be extended to include other config.m4 stuff (debugging, check the validity of the library etc.) but it works nicely for me.

  • Friday, January 11, 2013

    How to create and read QR codes on a Mac

    Just recently I was tasked with fnding a command line tool to create QR codes (there are a lot of web pages which offer online creation but a GUI sometimes does not cut it).

    My investigations had led me into much too complex territory when I finally could reduce the issue to a very simple solution by using ZXing (a Java library for QR codes). ZXing has already grown into a mighty library to support QR coding in various manners but its documentation was a little scarce for me so the following steps should give everyone a simple recipe. (Note: All of the below was done a Mac with Java 1.6)

  • Download and unzip ZXing into a directory of your choice.
    (currently version 2.1. Because this is a development build you will find a lot of directories and files)
  • Take the files core/core.jar and javase/javase.jar and put them into a new directory (preferrably outside of the ZXing tree. Afterwards you could remove the previous ZXing directory if you wanted)
  • Go to that new directory and run the commands described below

    Create a QR code file

    java -cp ./core.jar:./javase.jar \
         com.google.zxing.client.j2se.CommandLineEncoder \
         http://ajhaupt.blogspot.com
    
    will take the URL as input and create a file out.png containing the QR code.
    Here is the image:

    There are options to change the image type, dimensions and name of the output file, other QR attributes like encoding or error correction levels are fixed and you would need to create your own class to handle that.

    Decode a QR code file

    QR code decoding is mostly done via an image capture device (camera) and a corresponding application (normally on smart phones) but this ZXing command runs on a computer and allows to find and decode a QR code in image files.
    Here is the output of the file above (easy since it simply contains the QR code) but it would also work on images containing other things (with the QR code sitting somewhere) and also on images with multiple QR codes.
    java -cp ./core.jar:./javase.jar \
         com.google.zxing.client.j2se.CommandLineRunner \
         out.png
    file:/Users/......../out.png (format: QR_CODE, type: URI):
    Raw result:
    http://ajhaupt.blogspot.com
    Parsed result:
    http://ajhaupt.blogspot.com
    Found 4 result points.
      Point 0: (68.5,230.5)
      Point 1: (68.5,68.5)
      Point 2: (230.5,68.5)
      Point 3: (203.5,203.5)
    

    Since all of this is in Java it will of course run on other platforms too.

    Neither command seems to work with more complex input like vCard but corresponding modules exist in the library, I guess the command line utilities were provided by the authors as a proof of concept, it's up to the user to extend them if necessary.