ghacks Technology News
  • Author: Martin
  • Thursday March 11, 2010

PDF OCR Turns PDF Documents Into Text

It sometimes happens that text in a pdf document cannot be selected in a pdf reader like Adobe Reader or Foxit Reader. This is usually the case with scanned documents that have been embedded into the pdf file. One of the options to work with the text in those pdf documents is to use OCR technology.

OCR means optical character recognition which basically makes use of an algorithm to identify the characters displayed in the pdf file.

PDF OCR is a free software program for the Windows operating system that can turn pdf documents into editable text.

The interface is divided into two areas that are independent from each other. The first window loads the pdf document and displays its contents in its interface. All pages are displayed on the left and it is possible to read the pdf right on the screen.

The Start OCR button displays a configuration window for the OCR process. It is possible to OCR all pages, a selection of pages or only the current page.

The progress and status is displayed right in the window and all processed pages are displayed in the second window afterwards.

The PDF OCR Editor is a basic text editor that can theoretically be used to edit the text right away. The OCR process naturally misinterprets some of the characters which have to be edited afterwards.

The text editor can export the converted text as a text or doc document which indicates the second possibility of editing the text.

It usually makes sense to save the processed pdf as a doc and load it into a text processing application like Microsoft Word which offers spell and grammar checking.

PDF OCR is a convenient program that offers its users a fast and easy way of turning pdf documents into text. The program supports ten different languages and is compatible with all 32-bit and 64-bit editions of the Microsoft Windows operating system.

A alternative is Free OCR Scanning which is an online service that can process pdf files among others.

Attention: Copying articles to your website is not allowed. If you like the article you may copy the code below and post it on your website or user profile.



Related Articles:

Find and Replace text across multiple documents
Recover Corrupt Word Documents [Windows]
Restore Data From Corrupt Excel And Word Documents
Text Document Comparison Software TextDiff
Unbreak Copied Text From PDF Documents

Tags:, , , ,
Categories:Windows, software



Responses so far:

  1. PDF OCR says:

    Thank you for your article. I will do my best to add the grammar feature on next version

  2. DanTe says:

    Just tried it. Scanned it with Avira and McAfee, no detections. Tried it on a convoluted government PDF doc. Works beautifully.

    Author of the software might want to note that the install path is pdfPCR. I believe it should be corrected to pdfOCR?

  3. Eric says:

    Tried it too. Great suggestion. Will use often.

  4. Doesn’t Acrobat have its own OCR implementation when you scan documents directly in Acrobat?

  5. Luiz says:

    It shuts down automatically before even OCR begins.
    Does not work, unfortunately.

Leave a Reply   Follow Ghacks   Subscribe To Comment Rss

© 2005-2010 Ghacks.net. All Rights Reserved. Privacy Policy - About Us