Tuesday, January 22, 2013

My first PHP extension: fetching synonyms from WordNet

This week I developed my first extension of PHP: an interface to fetch synonyms of words from the WordNet database.

Versions: PHP 5.2.15, WordNet 3.0 , MacOS 10.5.8

Summary: after downloading and compiling WordNet on my Mac I followed the instructions (found on some tutorial web pages) how to build a PHP shared library and finally wrote a little PHP script to invoke the shared library.

Installing WordNet

  • Download here (version 3.0 for UNIX systems)
  • Compile with the option to install in a user directory (I always do this rather than changing the system setup)
    ./configure --prefix=/Users/testuser/WordNet
    make
    make install
    
  • Testing
    export WNHOME=/Users/testuser/WordNet
    $WNHOME/bin/wn application -synsn
    
    will return the following. It basically means that the word 'application' has 7 meanings ('senses' in the terminology of WordNet) and it lists possible synonyms for each meaning.
    Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun application
    
    7 senses of application                                                 
    
    Sense 1
    application, practical application
           => use, usage, utilization, utilisation, employment, exercise
    
    Sense 2
    application
           => request, petition, postulation
    
    Sense 3
    application, coating, covering
           => manual labor, manual labour
    
    Sense 4
    application, application program, applications programme
           => program, programme, computer program, computer programme
    
    Sense 5
    lotion, application
           => remedy, curative, cure, therapeutic
    
    Sense 6
    application, diligence
           => effort, elbow grease, exertion, travail, sweat
    
    Sense 7
    application
           => action
    

    Note: the original idea was to create a program to find synonyms for words in order to have variations when writing a text, preferably an automated word replacement (as there are a number of such websites available on the internet) but as you can see already in this example without context it is rather likely to pick the wrong meaning of a word and thus substitute it by something wrong (with probably funny results). So I never wrote this program in the end.

    Create PHP shared library

    In order to create the PHP shared library I was following the instructions on this website.

  • I took the example config.m4, .h and C file, replaced all hello/HELLO occurances by wordnet/WORDNET respectively and saved them (as php_wordnet.h and wordnet.c). I took a copy of these 3 files since they needed to be the fallback version in case of configure or make issues.
    I then ran
    phpize
    ./configure --enable-wordnet
    make
    
    in order to see whether I could get the basic example compiled.
    In case of errors I cleared the working dir, took a copy of the saved files, made the appropriate changes and ran the 3 steps again until I got a working set of files (which eventually became my fallback version for the next step).
  • I then needed to get WordNet into my files. I had decided that I wanted to provide only one function which should return synonyms of a given word.
    • php_wordnet.h: no change necessary
    • config.m4: this required a complete rewrite in order to achieve that configure would recognize a --with-wordnet=/Users/testuser/WordNet option and that the WordNet include file wn.h and the library libWN.a would be properly recognized. During this process I learned a lot about config.m4 (and eventually wrote a config.m4 generator on the side) and here is the working file.
      dnl This is the config.m4 file for extension WordNet (wordnet)
      
      dnl The extension should be built with   --with-wordnet=DIR
      PHP_ARG_WITH( wordnet, whether to enable WordNet support,
      [ --with-wordnet=[DIR]   Enable WordNet support])
      
      dnl Check whether the extension is enabled at all
      if test "$PHP_WORDNET" != "no"; then
      
        AC_DEFINE(HAVE_WORDNET, 1, [Whether you have WordNet])
      
        dnl Add include path
        PHP_ADD_INCLUDE($PHP_WORDNET/include)
        
        dnl Check whether the lib exists in the lib directory
        AC_MSG_CHECKING(for libWN.a in $PHP_WORDNET and /usr/local /usr )
        for i in $PHP_WORDNET /usr/local /usr; do
            if test -r $i/lib/libWN.a; then
              WORDNET_DIR=$i
              AC_MSG_RESULT(found in $i)
            fi
        done
      
        if test -z "$WORDNET_DIR"; then
          AC_MSG_RESULT(not found)
          AC_MSG_ERROR(Please reinstall the libWN.a distribution - includes should be
                       in /include and libWN.a should be in /lib)
        fi
      
        dnl Add lib path
        PHP_ADD_LIBRARY_WITH_PATH( WN, $PHP_WORDNET/lib, WORDNET_SHARED_LIBADD)
        AC_DEFINE( HAVE_WORDNETLIB, 1, [Whether WordNet support is present and requested])
        
        dnl Finally, tell the build system about the extension and what files are needed
        PHP_SUBST(WORDNET_SHARED_LIBADD)
        PHP_NEW_EXTENSION( wordnet, wordnet.c, $ext_shared)
      fi
      
    • wordnet.c: basically looking in depth into the C-code of WordNet gave me the final idea where to hook in with my PHP shared library which consists of one PHP function and one helper function (which does all the work and is mimicking the original WordNet code).
      #ifdef HAVE_CONFIG_H
      #include "config.h"
      #endif
      
      #include "php.h"
      #include "php_wordnet.h"
      #include "wn.h"
      
      static function_entry wordnet_functions[] = {
          PHP_FE(synsn, NULL)
          {NULL, NULL, NULL}
      };
      
      zend_module_entry wordnet_module_entry = {
      #if ZEND_MODULE_API_NO >= 20010901
          STANDARD_MODULE_HEADER,
      #endif
          PHP_WORDNET_EXTNAME,
          wordnet_functions,
          NULL,
          NULL,
          NULL,
          NULL,
          NULL,
      #if ZEND_MODULE_API_NO >= 20010901
          PHP_WORDNET_VERSION,
      #endif
          STANDARD_MODULE_PROPERTIES
      };
      
      #ifdef COMPILE_DL_WORDNET
      ZEND_GET_MODULE(wordnet)
      #endif
      
      static char *wnsynsn(char *name);
      
      PHP_FUNCTION(synsn)
      {       
          char *name;
          int name_len;
      
          if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &name, &name_len) == FAILURE) {
              RETURN_NULL();
          }
      
          char *retstr;
          if( (retstr = wnsynsn(name))==NULL) {
              RETURN_NULL();
          }
      
          RETURN_STRING(retstr, 1);
      }
      
      char *wnsynsn(char *name) {
          if (wninit()) {             /* open database */
              php_printf("%s","wn: Fatal error - cannot open WordNet database\n");
              exit (-1);
          }
      
          dflag = wnsnsflag   = 0;
          char *outbuf  = findtheinfo(name, NOUN, HYPERPTR, ALLSENSES );
          return( strlen(outbuf)>0 ? outbuf : NULL);
      }
      

    Now I could run

    phpize
    ./configure --with-wordnet=/Users/testuser/WordNet
    make
    
    (I did not take care of creating a proper 'make install' part)

    Create a PHP test script

    This simple script loads the shared library and prints the results of the WordNet synsn request for 'application'.
    <?php
    // Loading the extension based on OS
    if (!extension_loaded('wordnet')) {
        if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
            dl('php_wordnet.dll');
        } else {
            dl('wordnet.so');
        }
    }
    
    echo synsn("application");
    ?>
    
    Run as
    WNHOME=/Users/testuser/WordNet      # necessary to detect the WordNet dictionary
    PHPDIR=the directory where the compilation took place
    php -d extension_dir=$PHPDIR/modules test.php
    

    Or if you want the output to be a bit more dense:

    $synsn   = synsn("application");
    $patterns       = array(
            '/.* senses of .*\n/',          # remove the '... senses of ...' line
            '/^Sense.*\n/m',                # remove the 'Sense ...' lines
            '/^ *\n/m'                      # remove empty lines
             );
    
    $synsn   = preg_replace( $patterns, "", $synsn);
    echo $synsn;
    
    will lead to
    application, practical application
           => use, usage, utilization, utilisation, employment, exercise
    application
           => request, petition, postulation
    application, coating, covering
           => manual labor, manual labour
    application, application program, applications programme
           => program, programme, computer program, computer programme
    lotion, application
           => remedy, curative, cure, therapeutic
    application, diligence
           => effort, elbow grease, exertion, travail, sweat
    application
           => action
    

    Of course there are other solutions to access WordNet: call the WordNet executable wn with shell_exec, use an installation of WordNet in MySQL etc. I wanted to do that specific integration via shared library.
  • 2 comments:

    1. I try to build ext but when I execute "make" error this

      wordnet.c:19:5: note: (near initialization for ‘wordnet_module_entry.functions’)
      Makefile:181: recipe for target 'wordnet.lo' failed
      make: *** [wordnet.lo] Error 1

      how to fix my error?
      thanks

      ReplyDelete