Impressive OCR. Abbyy Finereader

2008-10-18T17:36:31Z
Dave Pawson.  link
Home

Impressive OCR. Abbyy Finereader

I've had a copy of Abbyy Finereader (version 7) for a year or two now, probably more. I only use it rarely, so I can't say I'm familiar with it beyond casual use. I had a small job to do today, scanning about 30 pages of an old text book containing images and 1970's text (if that means much to you). The font in use reminded my of a classic typewriter font. Quite a few of the number 1 characters were actually I, the zeros were easy to mix up with letter O. It was clear enough, just poor type. I set too this morning using the same approach I'd used previously, scan, OCR, spell check on a page by page basis. Due to my own mistake I deleted the group of files I'd created just a little too soon! I.e. I had to start again. Being innately lazy I then looked for a smarter way of working. I scanned all the pages, recognised them together, then spell-checked them together. Boy was I impressed! OK my edition may be out of date, but the recognition rate was way higher than I'd expected. Once I accepted that some words were being offered as potential mistakes just for checking, i.e. they were actually correct and could be ignored, I guess it took me about 10 minutes to spell check the lot, some 30 pages or so. Having done it once, the spell checker had remembered all the odd words (French in this case) which it had met, so it just steamed through it.

I wanted an html target (just an electronic copy of these pages) so I chose the dumb output. No formatting, no 'keep the fonts', no 'keep the layout'. The output was a clean HTML file (even called up the strict DTD). My only gripe was that it used (properly) some html character entities... so I had to find a copy of the DTD with the entities, which I must admit I haven't used for a long time! Emacs sgml mode let me swap over the basic paragraphs for headings to match the presentation, and hey presto I had good copy!

I was really really impressed with Finereader. The errors were on

I really can't complain about that in any way. Until I saw that the source was I (not 1) I kept asking why it wasn't learning and re-applying that knowledge, though I guess it was to some extent.

Ten out of ten for Abbyy for me. I'm using the professional version which, at £90 for version 9 is (IMHO) quite a bargain.

Keywords: ocr

Comments (View)

Return to main index