Saturday, February 23, 2013

Reading utmp/wtmp and utmpx/wtmpx with Perl's Convert::Binary::C module

Last week I came to revisit a project which I had done some time ago in 2007: analyzing wtmpx on Solaris 10. Since wtmpx is in binary format its contents need to be extracted with some tool. wtmpx is a sequence of equally sized chunks, each chunk representing a 'struct' element which is defined in a corresponding C header file.
My old approach was to use the Solaris tool fwtmp and the flow was something like
cat /var/adm/wtmpx | /usr/lib/acct/fwtmp |  wtmpx.pl ....
fwtmp contains all the transformation logic binary -> ASCII and would put out long lines of the form
andreash                         dt   console                               1506
   7 0000 0000 1311377458 0 0 3 :0 Sa Jul 23 01:30:58 2011
(this is one line).
These lines are fixed field text files and can be easily parsed by awk, Perl, whatsoever and my subsequent wtmpx.pl did analyze the contents.
In the meantime I had learned about Perl's pack/unpack functions and the following recipe can be found on various places on the Internet:
$wtmpx     = "/var/adm/wtmpx";
$typedef   = 'A32 A4 A32 i s s s b b i i i i5 s A257 b';
$sizeof    = length pack ($typedef, () );

open(WTMPX, $wtmpx) || die("Cannot open $wtmpx\n");

# read chunks of length 'sizeof'
while( read(WTMPX, $buffer, $sizeof) == $sizeof ) {
  ($ut_user, $ut_id, $ut_line, $ut_pid, $ut_type, $ut_e_exit, $ut_e_termination,
   $dummy, $dummy, $ut_tv_sec, $ut_tv_usec, $ut_session, $ut_pad[0], $ut_pad[1],
   $ut_pad[2], $ut_pad[3], $ut_pad[4], $ut_syslen, $ut_host, $dummy)
  = unpack($typedef, $buffer);

  # ... and now access the fields 
}
close(WTMPX);
So what fwtmp is doing needs to be done by Perl by defining a template which can be used by unpack to decipher the binary content. The template mimicks the 'struct' elements (struct futmpx in /usr/include/utmpx.h). Each letter represents a type of variable e.g A32 stands for string of length 32, i for integer, s for short integer a.s.o. (all defined in perlfunc). Interestingly (and making this thing somewhat complex) is the introduction of dummy bytes here and then. This is in order to comply with how types of variables are represented and aligned on the machine. Here we have an alignment of 4 so the sequence of 2-byte short integers 's s s' followed by a 4-byte integer needs to be filled with 2 extra bytes (the integer needs to start at a 4-byte boundary) which leads to 's s s b b i'. As you can see this definition is very platform and OS specific and such a script is not portable easily.
When I wanted to port this to Linux (Ubuntu 11 in my case) there are a couple of things to note. On Solaris utmp/wtmp has been deprecated and utmpx/wtmpx are being used which reside in /var/adm. On Linux it's the other way round: there is no utmpx/wtmpx file but only utmp in /var/run and wtmp in /var/log. There is no fwtmp equivalent on Linux and of course the 'struct' definition differs from Solaris (luckily enough though the names of the struct elements match to a great extent). So finding a portable solution seemed to be an impossible task from the start.

Then I found one of the most astonishing Perl modules Convert::Binary::C I have ever used. It allows to use C code in Perl, and it does that in a very nifty way which I'll show below.

First let's recap what I want to achieve:

  • 4. create a platform independent Perl script which allows to
  • 3. read utmpx/wtmpx and utmp/wtmp files
  • 2. and thus needs to determine the size of the C utmpx struct in an easy way
  • 1. i.e. by using the system's /usr/include/utmpx.h file

    Step 1: Read utmpx.h

    Below would have been the easiest code imaginable: tell Convert::Binary::C to look for utmpx.h in the /usr/include directory.
    use Convert::Binary::C;
    $c = Convert::Binary::C->new( Include => ['/usr/include'] );
    $c->parse_file('utmpx.h');
    
    This does not work though and thus I could not quite achieve goal 4.

    Solaris fails with:

    sys/isa_defs.h, line 503: #error "ISA not supported"
            included from /usr/include/sys/feature_tests.h:12
            included from /usr/include/utmpx.h:36 at - line 3.
    
    which needs to be rectified by defining one of the macros __i386, i386, __ppc, __sparc, or sparc for the machine type. This code here works (note the definition of __sparc):
    use Convert::Binary::C;
    $c = Convert::Binary::C->new( 
               Include => ['/usr/include'], 
               Define => [qw(__sparc)] 
            );
    $c->parse_file('utmpx.h');
    

    Ubuntu fails with:

    features.h, line 323: file 'bits/predefs.h' not found
     included from /usr/include/utmp.h:22 at - line 3.
    
    which needs to be rectified by adding another include directory
    use Convert::Binary::C;
    $c = Convert::Binary::C->new( 
               Include => ['/usr/include', '/usr/include/i386-linux-gnu']);
    $c->parse_file('utmpx.h');
    

    Funnily enough both codes can be combined: setting __sparc on Ubuntu does not have any effect since it is not used anywhere in the include files, adding a non-existent include directory on Solaris does not hurt either so I have the unified code and achieved goal 1 but not as nicely as I had hoped.

    use Convert::Binary::C;
    $c = Convert::Binary::C->new( 
               Include => ['/usr/include', '/usr/include/i386-linux-gnu'], 
               Define => [qw(__sparc)] 
            );
    $c->parse_file('utmpx.h');
    
    But what Convert::Binary::C achieved is nothing less than all macros and definitions of the C include file are available now in Perl, not a small task.

    In the next sections I will show how to explore a 'struct' and its elements (or members as the C folks prefer).

    Step 2: Determine The Size Of The 'struct'

    Again there is a difference between Solaris and Ubuntu: the 'struct' is named futmpx on Solaris and utmpx on Ubuntu. So the code will differ here but one should also note that most the element names are equal, a feature which will be used later on.

    futmpx on Solaris:

    struct futmpx {
            char    ut_user[32];            /* user login name */
            char    ut_id[4];               /* inittab id */
            char    ut_line[32];            /* device name (console, lnxx) */
            pid32_t ut_pid;                 /* process id */
            int16_t ut_type;                /* type of entry */
            struct {
                    int16_t e_termination;  /* process termination status */
                    int16_t e_exit;         /* process exit status */
            } ut_exit;                      /* exit status of a process */
            struct timeval32 ut_tv;         /* time entry was made */
            int32_t ut_session;             /* session ID, user for windowing */
            int32_t pad[5];                 /* reserved for future use */
            int16_t ut_syslen;              /* significant length of ut_host */
            char    ut_host[257];           /* remote host name */
    };
    

    utmpx on Ubuntu (in fact I shortened it a a little for better readability)

    struct utmpx
    {
      short int ut_type;  /* Type of login.  */
      __pid_t ut_pid;  /* Process ID of login process.  */
      char ut_line[__UT_LINESIZE]; /* Devicename.  */
      char ut_id[4];  /* Inittab ID. */
      char ut_user[__UT_NAMESIZE]; /* Username.  */
      char ut_host[__UT_HOSTSIZE]; /* Hostname for remote login.  */
      struct __exit_status ut_exit; /* Exit status of a process marked
           as DEAD_PROCESS.  */
      long int ut_session;  /* Session ID, used for windowing.  */
      struct timeval ut_tv;  /* Time entry was made.  */
      __int32_t ut_addr_v6[4]; /* Internet address of remote host.  */
      char __unused[20];  /* Reserved for future use.  */
    };
    
    Another important consideration at this point is related to the Perl code above when we were defining the template for Perl's unpack function. Some extra bytes were introduced into the template to get the correct alignment and the alignment has to be set as well in Conver::Binary::C but in a much easier way. One could set the alignment to one, two, ... byte boundaries but there is also the option to use the native alignment and this is what I'm going to do. With the alignment set there is a 'sizeof' function to determine the size of the 'struct'.
    # Choose native alignment
    $c->configure( Alignment => 0 );   # the same on both OSs
    
    $sizeof = $c->sizeof('futmpx');    # on Solaris (=372)
    
    $sizeof = $c->sizeof('utmpx');     # on Ubuntu  (=384)
    

    Step 3: Read Log Entries wtmpx

    Since the filenames are different they need to be defined differently (or passed as an argument to the Perl script).
    $wtmpx = "/var/adm/wtmpx";   # on Solaris, or /var/adm/utmpx
    
    $wtmpx = "/var/log/wtmp";    # on Ubuntu,  or /var/run/utmp
    
    or of course saved copies of these files could be used too (many admins keep the last n copies and rotate filenames).

    Finally these files can be read. and the elements of the structure can be accessed.

    open(WTMPX, $wtmpx) || die("Cannot open $wtmpx\n");
    while( read(WTMPX, $buffer, $sizeof) == $sizeof ) {
    
      $unpacked = $c->unpack('futmpx', $buffer);   # Solaris
    
      $unpacked = $c->unpack('utmpx', $buffer);    # Ubuntu
    
      # And now do something with the content
      # ...
    }
    close(WTMPX);
    
    Accessing the elements of the 'struct' is quite easy by using the names of the members of the 'struct' as defined in the include file e.g. $unpacked->{ut_pid} to get the process id of the entry. For certain types of variables though (e.g. strings) some special handling is needed. Convert::Binary::C reads arrays of characters as arrays of characters. If one wants a nice string some conversion needs to be done and I am using the Perl pack recipe.
    @u = @{$unpacked->{ut_user}};      # Save the ut_user array of chars
    $ut_user = pack( "C*", @u);        # Convert it to a string and save 
                                       # it in a new variable. Of course 
                                       # these two steps could be combined into one
    
    Here is some code which retrieves the user, process id, terminal number and exit code. And contrary to the first Perl solution (with unpack) it does not matter whether ut_user is the first (Solaris) or fifth (Ubuntu) element of the 'struct'. Convert::Binary::C has made this element accessibly by name and we don't need to care about its position, a welcome simplification.
    use Convert::Binary::C;
    
    $utmpxh = "utmpx.h";               # include file
    
    # two OS specific settings
    $struct = "futmpx";
    $wtmpx  = "/var/adm/wtmpx";        # on Solaris, or /var/adm/utmpx
    
    
    $c = Convert::Binary::C->new( 
               Include => ['/usr/include', '/usr/include/i386-linux-gnu'], 
               Define => [qw(__sparc)] 
            );
    $c->parse_file( $utmpxh );
    
    # Choose native alignment
    $c->configure( Alignment => 0 );    # the same on both OSs
    $sizeof = $c->sizeof( $struct );    # on Solaris (=372)
    
    open(WTMPX, $wtmpx) || die("Cannot open $wtmpx\n");
    while( read(WTMPX, $buffer, $sizeof) == $sizeof ) {
    
      $unpacked = $c->unpack( $struct, $buffer);   # Solaris
      $ut_user = pack( "C*", @{$unpacked->{ut_user}} );
      $ut_line = pack( "C*", @{$unpacked->{ut_line}} );
      
      print $ut_user, 
            " ", $unpacked->{ut_pid},
            " ", $ut_line,
            " ", $unpacked->{ut_exit}->{e_exit},
            "\n";
    }
    close(WTMPX);
    
    Note the exit code entry: ut_exit is a struct which is a member of the main struct futmpx. $unpacked->{ut_exit}->{e_exit} gets an element of ut_exit.

    Step 4: a platform independent script (?)

    Did I achieve a completely platform independent script? No.

    Even with that great module (which does a lot of things behind the scenes) I could not achieve complete independency. I could live with the different file names but the different 'struct' names and to a lesser extent the compile settings (include paths, compiler variables like __sparc) require some platform specific coding.

    What have I achieved (other than learning about a great module)? I could eliminate the unpack template i.e. the cumbersome manual sequence of types, their counts and their alignments and use the systems definitions instead and this is what I was looking for initially after all.


    A little fine tuning of the string conversion

    In the script above I used the following to convert an array of characters to a string.
    $ut_user = pack( "C*", @{$unpacked->{ut_user}} );
    
    In fact I was cheating a little. When you print '$ut_user' it does look ok in the terminal but when you investigate its output you'll find that it is actually a 32 byte string consisting of the real characters appended by NULL bytes to get to the full length (I use spaces here to better illustrate the issue).
    a n d r e a s h \0 \0 \0 \0 ....
    
    Using this object in comparisons will fail:
    if( $ut_user eq "andreash" ) {  # will not be reached; }
    

    There are two ways to resolve this.

  • Use Perl's unpack function with the 'Z*' template (available in newer Perl versions). It will remove the trailing NULL bytes.
    $ut_user = pack( "C*", @{$unpacked->{ut_user}} );   # a n d r e a s h \0 \0 \0 ...
    $ut_user = unpack( "Z*", $ut_user);                 # a n d r e a s h
    
  • Use the conversions offered by Convert::Binary::C
    # Tell the conversion object that its 'ut_user' element is a string
    $c->tag( $struct.'.ut_user', Format => "String" );
    
    # No further conversion is required
    # Use   $unpacked->{ut_user}   as is, it is equal to 'andreash', no more NULL bytes
    
    i.e. instead of applying a pair of pack/unpack operations to actual data in your code tell Convert::Binary::C to do this for you by configuring the conversion object correctly and after reading in the data they will be formatted and usable as is.

    These conversions work for both Solaris and Ubuntu in the same way and should be applied to the other strings 'ut_line', 'ut_host' etc. too.

  • 3 comments:

    1. Harvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

      data science training in bangalore

      ReplyDelete
    2. myTectra offers Big Data and Hadoop training in Bangalore using Class Room.
      myTectra offers Live Online Big Data and Hadoop training Globally.
      Big Data and Hadoop training Unlike traditional systems, Big Data and Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware.myTectra Big Data and Hadoop training is designed to help you become a expert Hadoop developer. myTectra offers Big Data Hadoop Training in Bangalore using Class Room. myTectra offers Live Online Big Data and Hadoop training Globally.
      hadoop training in bangalore

      ReplyDelete
    3. Python has adopted as a language of choice for almost all the domain in IT including the most trending technologies such as Artificial Intelligence, Machine Learning, Data Science, Internet of Things (IoT), Cloud Computing technologies such as AWS, OpenStack, VMware, Google Cloud, etc.., Big Data Analytics, DevOps and Python is prepared language in traditional IT domain such as Web Application Development, Infrastructure Automation ,Software Testing, Mobile Testing.

      python online training

      ReplyDelete