Tuesday, June 24, 2014

Easy When You Know How: Deskewing text scans under Linux


I've been scanning documents for ages, but more recently than before. Especially in large numbers. 

So I use the ADF (Automatic Document Feeder) on my HP-6400 scanner with XSane to direct the images to a Download directory.

But inevitably they are off by a fraction, no more than a fraction, like 0.5°, but my eye still catches it. And deems it unprofessional.

But if you have fifteen pages, opening each one to correct them is a royal PITA. And time consuming to boot.

So I cast about for ways to fix it: 
  • Looked for GIMP plugins. There was one once upon a time called Deskew but it seems to have evanesced.
  • Looked for options under Imagick. Its convert and mogrify scripts are very capable. There is a deskew option, but I did not find any simple description of how to use it.
  • Looked for other options...
And found the cited program deskew.  It is current, and has versions for Linux (both 32- and 64-bit), Mac, and Windows. You can recompile it if you really want to, but it comes with a Bin subdirectory containing precompiled versions for all three OS.

And it is a positive dream to use. Most of the parameters are precompiled with defaults. I changed the binary RRGGBB background (-b) parameter to white (fffff) from its default black (000000), but frankly haven't had the time to examine whether that was really necessary.

So the command at its simplest for me was

     deskew -o output.jpg -b ffffff input.jpg

Yes, you have to know what you're doing: unzip the download, move the entire resulting Deskew directory to somewhere useful, e.g., /data/graphics You also need to 

     chmod +x deskew

to make it executable and adjust your .bashrc to include that path, or simply prepend it to the deskew command:

     /data/graphics/Deskew/Bin/deskew ...

So, generally, a breeze.

A very elegant piece of work.

Well done. 

No comments: