2008-06-22T16:19:18Z
Dave Pawson.
link
Home
Is it worth it?
Quite a few times I've registered with a list-serve and had some message back containing content I've previously ignored, along the lines of
--- Administrative commands for the XXX list --- I can handle administrative requests automatically. Please do not send them to the list address! Instead, send your message to the correct command address: For help and a description of available commands, send a message to: blah@lists.blah.org>
I can honestly say I've never looked at them before. It's usually the first time you sign up for a list. Well, this weekend I had another look. Rather sinister reason but that's another matter. I wanted to preserve an archive of a few hundred emails. I duly sent my emails and got back (in groups of 100) all the list mails.... in plain text. No objection to plain text, just that if I want to process it I'd prefer XML
Some time back I acquired a copy of Friedls regex book. I complained to him that his feint markup was quite inaccessible. He simply replied how much pain he'd gone to simply to get the printer to do what he wanted. Ah well. Anyway, in that book I recalled a tiny section on hacking a reply to an email with perl. Page 57 second edn. So I took a look.
I've always had a begrudging admiration for perl hackers. ndw is the one that comes to mind first. I'm guessing it's the first tool that comes to mind when he wants to munge some text. I've always agreed with the 'line noise' description of perl code. Possibly more so when used for (what I take to be) it's base use, that of regex processing regular text files. Anyway I had a go. I'm posting it here simply because I'm guessing that the listserv isn't the only one to use this software, hence other list archives may be hacked using this code to get some XML for further refining.
All it does is wrap what I thought useful in markup. Also strips out actual email addresses sufficiently to make them anonymous. Hope you find it useful. Look out for linewraps
You'll have to change the list specific bits (header for instance.) Check for XXXX
#!/bin/perl
#usage: perl -w mkreply $1 >op.xml
#Rev 2. 2008-06-22T14:15:12Z.
# Process the header
print "<archive>";
$inmsg="";
$inhdr="0";
while ($line = <>){
$line =~s/&/&/g;
$line =~s/</</g;
$line =~ s/([A-Z0-9._%+-]+)@[A-Z0-9.-]+\.[A-Z]{2,4}/$1.../gi;
if ($line =~ m/^--- Administrative commands for the XXXX list ---$/){
$inhdr=1;
print "\n<hdr>";
}elsif ($inhdr){
print $line;
if ( $line =~ m/^----------------------------------------------------------------------$/){
$inhdr=0;
print "</hdr>\n";
}
}elsif ($line =~ m/^Topics \(messages ([0-9]+) through ([0-9]+)\):$/){
print "<messages><st>$1</st><end>$2</end></messages>\n";
}elsif ($line =~ m/\s+([0-9]+) by: (.*)$/){
print "<msg>";
print $1;
print "</msg>";
print "<auth>";
print $2;
print "</auth>\n";
}elsif ($line =~ m/^Re: (.*)$/) {
print "<subject>$1</subject>\n";
}elsif ($line =~ m/^(---------- Forwarded message ----------)$/) {
if ($inmsg){
print "</message>\n";
}
print "\n<message>\n";
$inmsg = "1";
}elsif ($line =~/^To:/) {
print "<!--List message -->";
}elsif ($line =~/^Date:(.*)$/){
print "<date>";
print $1;
print "</date>\n";
}elsif ($line =~/^Subject: (.*)/) {
print "<subject>";
print $1;
print "</subject>\n";
}elsif ($line =~ m/^From: "([^"]+)" \<(\S+)\>/){
print "<fm><nm>";
print $1;
print "</nm><email>";
print $2;
print "</email>";
print "</fm>\n";
}elsif ($line =~ m/^(.*)$/ && (not($inmsg))){
print "<subj>$1</subj>\n";
}else {
print $line;
}
}
print "</message>";
print "</archive>";
Keywords: perl
Comments (View)Return to main index