oXygen XML Editor

Some Java tools

Java tools

This is a bit of code to build an XML file from a CSV file. Its not unique, but I like it. It addresses the problems of embedded & or < characters, embedded quote marks, even line breaks within a field. I needed the latter since one of the files I play with derives from an HTML form generated input, which contains line breaks.

As with lots of software I was inspired by other peoples software. Initially this was a Java implementation by Danny Ayers. The web site (http://www.isacat.net/2001/code/CSVtoXML.htm ) appears not to be there at the moment. Secondly the Python implementation object-craft.com.au which provides a C solution. Between them they had everything I wanted... except they weren't in Java, so I re-wrote what they did in Java.

The primary functionality is to handle the following classes of CSV record

1,990,2,"1 knitting gauge (Product no. DH21)"
2,990,1,"Each pack contains the following:-
one pair of green and one pair of white "
3,990,3,"1 easy-to-see row  &  < counter (Product no. DH85)"
4,990,4,"Cassette featuring knitting instructions and other useful information."
101,1170,1,"50 pegs: 5 each of black, ,""purplish"" brown,  orange and pink."

Note the variants. The second record is split over two lines. The third contains special characters which would screw XML parsing. The last one nests quotation marks just as is found in an typical Microsoft export.

Finally, having used a general perl type of solution, I wanted to be able to specify the markup I wanted for any up-transform. Danny Ayers solution did this IIRC, using some RFC format used for the Microsfoft (and other?) ini files. So before you run this software, create a properties file along the lines of the one below, which describes the fields of the CSV file and specifies the markup to be used for each field. The example below shows this.

comment=Generated using CSVToXML


This file is used with a CSV file having four fields per record. I want to wrap the whole file in an element named doc, wrap each record in an element named entry, and have the individual fields use elements named in the fields section. Hence the last field will be tagged as <description>....

This allows me to generate valid XML with semantic markup, and reduces a nasty XSLT phase. (Of course its for XSLT usage! I have a hammer in my hand ..... )


As well as the Java source, I've packaged it up for those who want to simply use it as a jar file. This gives a simple command line of:

>java -jar CSVToXML.jar -?

CSVToXML 1.0 from Dave Pawson
Usage: java CSVToXML [options] {param=value}...
  -p filename     Take properties from named file
  -o filename     Send output to named file
  -i filename     Take CSV input from named file
  -t              Display version and timing information
  -?              Display this message

So a typical invocation might be:

>java -jar CSVToXML.jar -i inputFile.csv -o outputFile.xml -p propertyFile.txt

This specifies the input file, output file and properties file to be used.

That's it. Any problems let me know. The javadoc stuff is included in the zip file, as is a test CSV file with associated properties file

Here it is, zipped format.

You'll also need the XOM library. It is included in the zip, in an older version, but you can get the latest from here