Encoding in text files

2007-10-09T18:23:19Z
Dave Pawson.  link
Home

Encoding in text files

Encoding in text files

I was reading Ricks post on encodings, then followed it up with his old xtext ideas. What's that saying about as simple as necessary, and no more? I think it gets it about right. This is a C example from Rick.

//xtext encoding="ISO-8859-1" refo="%x" refc="%"
package com.topologi.tme1.editor;
// some text with a character reference here:  %x4444% 

Couldn't be much easier? refo is open, refc is close, the example says the rest.

Some time back I wrote some code to hack regular text into XML and posted it. This last weekend I updated it to use xom after someone noted that it had a bug. I corrected it. Then the question came back about something else with an example plain text file. I noted the encoding issue and had a pang. I'd just read the file from disk, no concern over encoding. What was really sweet was the ease with which the highly decorated Java input is tweaked to spec the encoding. Must have been all of 20 minutes between Googling for a how to, and offering users a full listing of available encodings, and bolting that into the read. I nearly tweaked the output XML, then decided to leave it at utf-8. The listing of encodings is scarily long too! Must be about 60 of them!

Keywords: java

Comments (View)

Return to main index