While it may not happen too often, you sometimes may want to copy text from an image into a document. It is certainly possible to type the text manually into the document which may be ok if it consists of a few words or sentences. But what if it is an image full of text? Maybe you have received a fax, or a document copy in image format that someone attached to an email.
Gttext is a free open source program for the Windows operating system to identify text in images and copy it to the Windows clipboard. The Ground Truthing tool for Color Images with Text needs to be installed before it can be used to copy text from images to the clipboard.
The program supports a variety of image formats including the popular jpg and png formats as well as bmp, tiff and gif. You start by loading an image into the program. One issue that I had was with the available file filter in the browser, as it offered separate filters for all image formats so that it was necessary to switch to the right filter before the image file would appear in the file browser.
All you then need to do in best case is to draw a rectangle around the text on the image that you want to copy. The program displays the copied text that it identified automatically in a popup with options to cancel, try again or to continue (copy to clipboard).
Try again will run the text recognition again to correct possible errors that were made in a previous run. The text recognition software supports various tools to optimize the image for identifying text. This includes zooming in or out, or modifying the documents brightness among other tools.
Another interesting feature is the ability to extract all text at once without selecting the text first. This is done with a click on Tools > Copy Text From > Full Image.
The text recognition algorithm of Gttext is solid, and worked very well on several document scans that I had in image format on my PC. You do need to go over the results though as they may contain errors that you need to correct manually.
Windows users can download Gttext from the project's Google Code project website. The program is compatible with 32-bit and 64-bit editions of the Microsoft Windows operating system.
Update: The program is no longer hosted on Google Code due to Google Code shutting down. You find it on its own domain SoftOCR now from where it can be downloaded.Advertisement
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.