Here I present a solution which uses some features available in extensions (and which are available in XSLT 2 as language features), namely tokenize and node-set.
tokenize will allow me to split a string into tokens at once rather than having to call substring-before and substring-after repeatedly. In a certain way it is a contrast to the template thinking of XSLT but of course useful.
node-set is a mighty tool since it allows me to transform variables into node sets and with that comes the ability to use proper XPATH functions on the nodes.
The xsltproc version on my Mac contains some EXSTL extensions (visible via xsltproc --dumpextensions
) so here are the required namespaces which need to be declared at the beginning of the script
namespace | |
---|---|
tokenize | xmlns:strings="http://exslt.org/strings" |
node-set | xmlns:common="http://exslt.org/common" |
And here is how to use them:
Usage | |
---|---|
tokenize | I use tokeinze in a for-each loop to split $someText delimited by $newline<xsl:for-each select="strings:tokenize($someText,$newline)" > ... </xsl:for-each> |
node-set | Transform the contents of a variable $lines into a node-set $lineNodes<xsl:variable name="lineNodes" select="common:node-set($lines)" /> |
All the work is being done in the parseDelimited template and it follows pretty much old style programming conventions. There is one loop which splits the complete input by newline. The first line is split by delimiter into the names of the headers All other lines are then split by delimiter into their individual fields. Everything is wrapped into elements as follows and and put into a variable. The pseudo-code is already close to its implementation.
element "data" for each line tokenize the line by delimiter element "row" for each field n element "header n" content of field n end of element "header n" end for end of element "row" end for end of element "data"
Here is the complete code.
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:strings="http://exslt.org/strings" xmlns:common="http://exslt.org/common" > <!-- From strings we use: tokenize From common we use: node-set --> <!-- Define delimiter and newline --> <xsl:variable name="delim" select="'	'" /> <xsl:variable name="newline" select="'
'" /> <!-- Define node1 and node2 for the output --> <xsl:variable name="node1" select="'data'" /> <xsl:variable name="node2" select="'row'" /> <xsl:template match="/"> <!-- Take whatever input is coming, don't care about 'fakeroot' --> <xsl:call-template name="root"/> </xsl:template> <xsl:template name="root"> <!-- Call the line parser with the whole content of the file --> <xsl:call-template name="parseDelimited"> <xsl:with-param name="delimitedText" select="." /> </xsl:call-template> </xsl:template> <xsl:template name="parseDelimited"> <xsl:param name="delimitedText" /> <!-- Split the file content by newline --> <xsl:variable name="lines"> <xsl:for-each select="strings:tokenize($delimitedText,$newline)" > <line> <xsl:value-of select='.' /> </line> </xsl:for-each> </xsl:variable> <!-- Create a node-set out of the previous 'lines' in order to be able to use them as an XPATH var --> <xsl:variable name="lineNodes" select="common:node-set($lines)" /> <!-- The first line containing the header fields --> <xsl:variable name="first" select='$lineNodes/line[1]' /> <xsl:variable name="headers" > <xsl:for-each select="strings:tokenize($first,$delim)" > <head> <xsl:value-of select='.' /> </head> </xsl:for-each> </xsl:variable> <!-- Create a node-set out of the previous 'headers' in order to be able to use them as an XPATH var --> <xsl:variable name="headerNodes" select="common:node-set($headers)" /> <!-- Loop through all lines, we can do this since it is a node set. This creates the actual XML content --> <xsl:variable name="output" > <!-- Start tag <data> --> <xsl:element name="{$node1}"> <xsl:value-of select='$newline' /> <xsl:for-each select="$lineNodes/line"> <!-- Skip the first line of course --> <xsl:if test="position() > 1"> <!-- Start tag <row> --> <xsl:element name="{$node2}"> <!-- Split the line by 'delim' and create an element for each entry. The element name is coming from the header line --> <xsl:for-each select="strings:tokenize(.,$delim)" > <xsl:variable name="p" select="position()" /> <xsl:element name="{$headerNodes/head[$p]}"> <!-- Print the actual content , phew! --> <xsl:value-of select="." /> </xsl:element> </xsl:for-each> <!-- End tag <row> --> </xsl:element> <xsl:value-of select='$newline' /> </xsl:if> </xsl:for-each> <!-- End tag </data> --> </xsl:element> <xsl:value-of select='$newline' /> </xsl:variable> <xsl:variable name="all" select="common:node-set($output)" /> <!-- Output of nodified elements --> <xsl:copy-of select="($all)/*" /> <!-- With a node-set one can now use its advantages e.g. sum up all Num values --> <xsl:value-of select='$newline' /> <xsl:element name="Sum_Num"> <xsl:value-of disable-output-escaping="yes" select="sum(common:node-set($output)/data/row/Num)"/> </xsl:element> </xsl:template> </xsl:stylesheet>There are two interesting pieces here.
<xsl:element name="{$headerNodes/head[$p]}">
(the creation of variable 'p' to store the position is actually superfluous but it makes the code more readable).
This script, call it data.xsl, needs to be fed by the same wrapped input as before, here is the script which I omitted last time.
#!/bin/sh # A shell wrapper for non-xml parsing with xslt FILE=data.txt FAKEROOT=fakeroot # Important for XML completeness but will be skipped by XSLT ( echo "<?xml version=\"1.0\"?>" printf "<$FAKEROOT>" cat $FILE echo "</$FAKEROOT>" ) | xsltproc data.xsl -
The result is as follows. Note the 71 in the last line which is the sum of Num (this makes the output non-XML, it's just there to show the possibilities).
<?xml version="1.0"?> <data> <row><Date>20120415</Date><Num>13</Num><Duration>2310</Duration></row> <row><Date>20120510</Date><Num>9</Num><Duration>1470</Duration></row> <row><Date>20120526</Date><Num>16</Num><Duration>3817</Duration></row> <row><Date>20120701</Date><Num>5</Num><Duration>2269</Duration></row> <row><Date>20120831</Date><Num>28</Num><Duration>4505</Duration></row> </data> <Sum_Num>71</Sum_Num>
Thank you for your resolution.
ReplyDeletehow to encode the qr code symbol using java