It sometimes happens that text in a pdf document cannot be selected in a pdf reader like Adobe Reader or Foxit Reader. This is usually the case with scanned documents that have been embedded into the pdf file.
One of the options to work with the text in those pdf documents is to use OCR technology to convert the information to text you can work with.
OCR means optical character recognition which uses an algorithm to identify the characters displayed in a pdf file to export them into a plain text document or other supported file format.
PDF OCR is a free software program for the Windows operating system that can turn pdf documents into editable text.
Update: The most recent free version of PDF OCR is severely limited. The PDF OCR tool can only process three pages, and the image to pdf tool displays a big watermark in the resulting PDF document. This makes the free version of the program unusable for most tasks.
The interface is divided into two areas that are independent from each other. The first window loads the pdf document and displays its contents in its interface. All pages are displayed on the left and it is possible to read the pdf right on the screen.
The Start OCR button displays a configuration window for the OCR process. It is possible to OCR all pages, a selection of pages or only the current page.
The progress and status is displayed right in the window and all processed pages are displayed in the second window afterwards.
The PDF OCR Editor is a basic text editor that can theoretically be used to edit the text right away. The OCR process naturally misinterprets some of the characters which have to be edited afterwards.
The text editor can export the converted text as a text or doc document which indicates the second possibility of editing the text.
It usually makes sense to save the processed pdf as a doc and load it into a text processing application like Microsoft Word which offers spell and grammar checking.
PDF OCR is a convenient program that offers its users a fast and easy way of turning pdf documents into text. The program supports ten different languages and is compatible with all 32-bit and 64-bit editions of the Microsoft Windows operating system.
A alternative is Free OCR Scanning which is an online service that can process pdf files among others.Advertisement
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.