extract process id (pid) and program name from the header line of pmap.
The strings can take these forms from simple to complex:
123: cmd 123: cmd -x foo 123: /usr/bin/cmd 123: /usr/bin/cmd -x fooand more complex with more parameters which are trickier to parse
123: /usr/bin/cmd -x /home/foo 123: /usr/bin/cmd -x 456: -d /home/fooi.e. very genereally speaking there is a pid followed by a colon and then a more or less complex command line where the program name can be fully qualified and carry a number of parameters. The last example deliberately introduces the digit and colon again as parameters.
Here is a try to express the string more verbally as a sequence of
There a various solutions to this in Perl and here I'll show two.
# Example string $str = "123: /usr/bin/cmd -x /home/foo"; # ^ should be a tab here # First I split the string using an optional colon :* # and a sequence of white space \s+ as field delimiters. # This will give me the pid and the program name and strip of the parameters ($pid,$cmd) = split /:*\s+/,$str; # In case of a fully qualified program nane # everything up to the last slash needs to be removed $cmd =~ s/.*\///; print "pid = $pid X cmd = $cmd\n";
Always looking for more concise code I wondered whether these two lines couldn't be shortened. Here is a one liner which requires explanation of course.
# Example string $str = "123: /usr/bin/cmd -x /home/foo"; # ^ should be a tab here # I try to match the following reqular expression # a sequence of digits (\d+) which will become $1 if successful # a colon and a tab # an optional sequence of characters ending in slash (\S+\/)* # which will become $2 # a sequence of characters (\S+) which will become $3 # The remainder of the string is not important as # we anchor the regular expression at the beginning. $str =~ /^(\d+):\t(\S+\/)*(\S+)/ ; print "pid = $1 X cmd = $3\n";
For easier readability I would have preferred the first code but when taking a deeper look I found some flaws in it namely the handling of incorrect strings. Assume this string below where the colon is missing and a string sits between pid and program name
$str = "123 xyz /usr/bin/cmd -x 456: /home/foo";The codes will result in
# Code 1 pid = 123 xyz /usr/bin/cmd -x 456 X cmd = foo # Code 2 pid = /home/ X cmd =In both cases the split happens at the wrong place with unforeseeable results.
I can use the second code though to its advantage by applying a check.
if( $str =~ /^(\d+):\t(\S+\/)*(\S+)/ ) { print "pid = $1 X cmd = $3\n"; }i.e. only when the regular expression is really matched I will use its values. The check gives me assurance.
I can't do this with the split in the first code other than doing a post-check by checking whether the pid really consists of digits etc. which would increase the code.
So I decided to use the regular expression in my code since it is still fairly readable by extracting just three parts of the overall string.
Would I want to extract more, say five or eight components, I probably would fall back to the split and a subsequent validity check.
No comments:
Post a Comment