Python text file

Processing regular text files in Python

Despite not being totally at home with Python, each time I need to do something quickly, or to experiment, where I want to think about the problem, not the language I'm using, I regularly turn to Python. I must admit to taking slight advantage of Uche and friends on the 4suite list (#4suite at irc.freenode.org), but I can rely on solid answers and tolerance of my ignorance. For which thanks.

Elsewhere on this site (enotes), I've noted how I used Python to process simple plain text into usable XML. I've now made use of that idiom a couple of times, and being fairly familiar with the code, decided to see if I could create a pattern from it. Since I found it so easy to do, I thought others might like it too. Python seems so much at home in this class of problem, and I doubt I'm the only one wanting to parse regular text.

Problem domain. Any text file (helps if you know the encoding) with a regular format. Further, the markers (structure, tokens call them what you will) commence a line. For example


token1 content

token2 content .....
multiple lines of content...
separated from next block by >1 newline character

tokenn content

Quite simple in its most basic form. On one occasion I wanted to enable continuation lines, i.e. lines broken from the token starter line by a single new line character. I did this by defining my 'block', then parsing the input file for blocks instead of lines. Again quite straightforward in Python.

The reason I called this the despatcher pattern is that I have a list of tokens, and a processor (a function or method) for each one. This is used for the regex processing (finding the tokens), as well as for calling the correct function having matched on the token value. I thought it was really really neat. Look for the despatcher variable near the bottom of the test file.

In the despatch.py file, the function 'replaceit' demonstrates both chunking and line based chunking. Full processing is provided for both types. A demo file, despatch.txtis provided. As is a usage string.

Enjoy. Mail me if you have questions. gpl applies.