<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>gHacks Technology News &#124; Latest Tech News, Software And Tutorials &#187; ocr</title> <atom:link href="http://www.ghacks.net/tag/ocr/feed/" rel="self" type="application/rss+xml" /><link>http://www.ghacks.net</link> <description>A technology news blog covering software, mobile phones, gadgets, security, the Internet and other relevant areas.</description> <lastBuildDate>Fri, 10 Feb 2012 13:29:21 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/> <item><title>PDF OCR Turns PDF Documents Into Text</title><link>http://www.ghacks.net/2010/03/11/pdf-ocr-turns-pdf-documents-into-text/</link> <comments>http://www.ghacks.net/2010/03/11/pdf-ocr-turns-pdf-documents-into-text/#comments</comments> <pubDate>Thu, 11 Mar 2010 11:20:02 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Software]]></category> <category><![CDATA[Windows]]></category> <category><![CDATA[ocr]]></category> <category><![CDATA[pdf]]></category> <category><![CDATA[pdf documents]]></category> <category><![CDATA[pdf ocr]]></category> <category><![CDATA[windows software]]></category> <guid
isPermaLink="false">http://www.ghacks.net/?p=23622</guid> <description><![CDATA[It sometimes happens that text in a pdf document cannot be selected in a pdf reader like Adobe Reader or Foxit Reader. This is usually the case with scanned documents that have been embedded into the pdf file. One of the options to work with the text in those pdf documents is to use OCR [...]]]></description> <content:encoded><![CDATA[<p>It sometimes happens that text in a pdf document cannot be selected in a pdf reader like Adobe Reader or Foxit Reader. This is usually the case with scanned documents that have been embedded into the pdf file. One of the options to work with the text in those pdf documents is to use OCR technology.</p><p>OCR means optical character recognition which basically makes use of an algorithm to identify the characters displayed in the pdf file.</p><p>PDF OCR is a free software program for the Windows operating system that can turn pdf documents into editable text.</p><p><span
id="more-23622"></span><img
src="http://www.ghacks.net/wp-content/uploads/2010/03/pdf_ocr-500x259.jpg" alt="" title="pdf ocr" width="500" height="259" class="alignnone size-medium wp-image-23624" /></p><p>The interface is divided into two areas that are independent from each other. The first window loads the pdf document and displays its contents in its interface. All pages are displayed on the left and it is possible to read the pdf right on the screen.</p><p>The Start OCR button displays a configuration window for the OCR process. It is possible to OCR all pages, a selection of pages or only the current page.</p><p><img
src="http://www.ghacks.net/wp-content/uploads/2010/03/pdf-500x257.jpg" alt="" title="pdf" width="500" height="257" class="alignnone size-medium wp-image-23625" /></p><p>The progress and status is displayed right in the window and all processed pages are displayed in the second window afterwards.</p><p>The PDF OCR Editor is a basic text editor that can theoretically be used to edit the text right away. The OCR process naturally misinterprets some of the characters which have to be edited afterwards.</p><p>The text editor can export the converted text as a text or doc document which indicates the second possibility of editing the text.</p><p>It usually makes sense to save the processed pdf as a doc and load it into a text processing application like Microsoft Word which offers spell and grammar checking.</p><p><a
href="http://www.pdfocr.net/">PDF OCR</a> is a convenient program that offers its users a fast and easy way of turning pdf documents into text. The program supports ten different languages and is  compatible with all 32-bit and 64-bit editions of the Microsoft Windows operating system.</p><p>A alternative is <a
href="http://www.ghacks.net/2009/06/27/free-ocr-scanning/">Free OCR Scanning</a> which is an online service that can process pdf files among others.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2010/03/11/pdf-ocr-turns-pdf-documents-into-text/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Google Docs OCR Demonstration</title><link>http://www.ghacks.net/2009/09/29/google-docs-ocr-demonstration/</link> <comments>http://www.ghacks.net/2009/09/29/google-docs-ocr-demonstration/#comments</comments> <pubDate>Tue, 29 Sep 2009 12:11:26 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Online Services]]></category> <category><![CDATA[The Web]]></category> <category><![CDATA[Google]]></category> <category><![CDATA[google account]]></category> <category><![CDATA[google docs]]></category> <category><![CDATA[google docs ocr]]></category> <category><![CDATA[ocr]]></category> <guid
isPermaLink="false">http://www.ghacks.net/?p=16769</guid> <description><![CDATA[One cannot really deny the fact that Google is constantly working on new features for their popular online services like Gmail or Google Docs. The latest feature is currently available as a demonstration only and not yet integrated into Google Docs. The Google Docs OCR demonstration can OCR the three image formats jpg, png and [...]]]></description> <content:encoded><![CDATA[<p><img
src="http://www.ghacks.net/wp-content/uploads/2009/09/google_docs.jpg" alt="google docs" title="google docs" width="178" height="54" class="alignleft size-full wp-image-16137" />One cannot really deny the fact that Google is constantly working on new features for their popular online services like Gmail or Google Docs. The latest feature is currently available as a demonstration only and not yet integrated into Google Docs. The Google Docs OCR demonstration can OCR the three image formats jpg, png and gif. Google lists the following limitations that are currently in place:</p><ul><li>Files must be fairly high-resolution &#8212; rule of thumb is 10 pixel character height.</li><li>Maximum file size: 10MB, maximum resolution: 25 mega pixel</li><li>The larger the file, the longer the OCR operation will take (500K: ~15s, 2MB: ~40s, 10MB: forever)</li></ul><p><span
id="more-16769"></span><img
src="http://www.ghacks.net/wp-content/uploads/2009/09/google_docs_ocr-500x325.jpg" alt="google docs ocr" title="google docs ocr" width="500" height="325" class="alignnone size-medium wp-image-16770" /></p><p>Supported image formats that are uploaded on the demonstration page will be turned into text documents and displayed in Google Docs once the process has been completed. The quality depends largely on the quality of the image. It is usually necessary to look over the text and correct errors that have been made during character recognition. Google Docs helps in the error correction by underlining unknown words in red in its interface. It still takes some time to correct the errors.</p><p>The OCR demonstration is linked to a Google Docs account but not integrated into Google Docs yet. It is very likely that Google will integrated OCR capabilities to Google Docs in the near future. You can use the demonstration <a
href="http://googlecodesamples.com/docs/php/ocr.php">page</a> for now to test the OCR service.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2009/09/29/google-docs-ocr-demonstration/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Free OCR Scanning</title><link>http://www.ghacks.net/2009/06/27/free-ocr-scanning/</link> <comments>http://www.ghacks.net/2009/06/27/free-ocr-scanning/#comments</comments> <pubDate>Sat, 27 Jun 2009 12:41:16 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Online Services]]></category> <category><![CDATA[documents]]></category> <category><![CDATA[free ocr]]></category> <category><![CDATA[images]]></category> <category><![CDATA[ocr]]></category> <category><![CDATA[ocr scanner]]></category> <category><![CDATA[ocr scanning]]></category> <category><![CDATA[ocr service]]></category> <category><![CDATA[online service]]></category> <guid
isPermaLink="false">http://www.ghacks.net/?p=13917</guid> <description><![CDATA[If you are looking for an online service that offers free OCR scanning then you might want to point your web browser to the Free Online OCR service. The service allows users to upload images that will then be processed immediately. The OCR scanning does not take longer than a few seconds after uploading the [...]]]></description> <content:encoded><![CDATA[<p><img
src="http://www.ghacks.net/wp-content/uploads/2009/06/free_ocr_scanning.jpg" alt="free ocr scanning" title="free ocr scanning" width="435" height="115" class="alignleft size-full wp-image-13918" />If you are looking for an online service that offers free OCR scanning then you might want to <a
href="http://www.free-ocr.com/">point</a> your web browser to the Free Online OCR service. The service allows users to upload images that will then be processed immediately. The OCR scanning does not take longer than a few seconds after uploading the image. Results are immediately shown in a text form on the same page from where they can be copied and pasted into other software programs or services.</p><p>The free OCR scanning service supports PDF, JPG, GIF, TIFF or BMP files with a maximum file size of two Megabytes. The OCR currently supports the six languages English, German, Spanish, French, Italian and Dutch.</p><p><span
id="more-13917"></span>Best results are achieved if the images have a dpi setting of at least 150. That&#8217;s problematic when taking screenshots as these usually are taken at a lower value.</p><p>Results range from impressive to workable and it is a good idea to check the recognized text and correct any errors made during the OCR scan. Free OCR has a few additional limitations that are mentioned in the faq on the website. Probably the two biggest restrictions are a one page limitation when scanning pdf documents and that it will not recognize document layouts which means that a two column layout will be processed as a single column layout.</p><p>The developers of Free OCR promised to update their service in the near future to remove these restrictions and limitations. Thanks go to JoJo for sending in the tip.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2009/06/27/free-ocr-scanning/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>Document Imaging Software JOCR</title><link>http://www.ghacks.net/2009/03/15/document-imaging-software-jocr/</link> <comments>http://www.ghacks.net/2009/03/15/document-imaging-software-jocr/#comments</comments> <pubDate>Sun, 15 Mar 2009 19:11:37 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Software]]></category> <category><![CDATA[Windows]]></category> <category><![CDATA[document imaging]]></category> <category><![CDATA[document imaging software]]></category> <category><![CDATA[jocr]]></category> <category><![CDATA[microsoft-office]]></category> <category><![CDATA[ocr]]></category> <category><![CDATA[office tools]]></category> <category><![CDATA[portable software]]></category> <category><![CDATA[windows software]]></category> <guid
isPermaLink="false">http://www.ghacks.net/?p=11247</guid> <description><![CDATA[JOCR is a specialized optical character recognition (ocr) software that can recognize characters from images taken on the computer&#8217;s desktop. Before we delve into the functionality it should be noted that JOCR requires the Microsoft Office Document Imaging component which is a component of Microsoft Office 2003 and never Office versions. The Office Tool has [...]]]></description> <content:encoded><![CDATA[<p>JOCR is a specialized optical character recognition (ocr) software that can recognize characters from images taken on the computer&#8217;s desktop. Before we delve into the functionality it should be noted that JOCR requires the Microsoft Office Document Imaging component which is a component of Microsoft Office 2003 and never Office versions. The Office Tool has to be installed for JOCR to function.</p><p>JOCR itself is a portable software program. The interface looks like screen capturing software programs. It offers to capture a region, the desktop or the active window. The image will then be displayed in the program&#8217;s interface with options to print, copy or recognize.</p><p>Recognizing is obviously the main part as it will analyze the image at hand to recognize all the characters that it contains. The document imaging software will then display the recognized characters count and display them in the default text editor from where they can be edited or copied easily.</p><p><span
id="more-11247"></span><img
src="http://www.ghacks.net/wp-content/uploads/2009/03/document_imaging_software-500x293.jpg" alt="document imaging software" title="document imaging software" width="500" height="293" class="alignnone size-medium wp-image-11248" /></p><p>The document imaging software can be useful in situations where text has to be copied from interface elements. This can be error messages, text that is displayed in applications or images. The recognition rate depends largely on the type of image and text used. It ranges from brilliant (almost no editing required) to weak (failed to recognize certain characters, lots of editing required).</p><p>JOCR is compatible with the following languages: Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2009/03/15/document-imaging-software-jocr/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>OCR Document Scanning With Smartphones</title><link>http://www.ghacks.net/2009/03/12/ocr-document-scanning-with-smartphones/</link> <comments>http://www.ghacks.net/2009/03/12/ocr-document-scanning-with-smartphones/#comments</comments> <pubDate>Thu, 12 Mar 2009 14:07:56 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Software]]></category> <category><![CDATA[Windows]]></category> <category><![CDATA[digital camera]]></category> <category><![CDATA[document scanning]]></category> <category><![CDATA[ocr]]></category> <category><![CDATA[ocr scanning]]></category> <category><![CDATA[ocr software]]></category> <category><![CDATA[smartphones]]></category> <category><![CDATA[text recognition]]></category> <category><![CDATA[top ocr]]></category> <category><![CDATA[windows software]]></category> <guid
isPermaLink="false">http://www.ghacks.net/?p=11144</guid> <description><![CDATA[The common way to scan documents is to use a hardware scanner to do so and a text recognition software afterwards using optical character recognition (ocr). The advent of digital cameras and smartphones created alternative means for OCR document scanning. A good enough digital camera is capable of photographing the document which can then be [...]]]></description> <content:encoded><![CDATA[<p>The common way to scan documents is to use a hardware scanner to do so and a text recognition software afterwards using optical character recognition (ocr). The advent of digital cameras and smartphones created alternative means for OCR document scanning. A good enough digital camera is capable of photographing the document which can then be processed by OCR software programs like TOP OCR.</p><p><a
href="http://www.topocr.com/topocr.html">TOP OCR</a> processes images of documents that have either been taken by scanners, digital cameras or smartphones. It basically allows to emulate the usual document scanning that is done by a hardware scanner with images taken by digital cameras instead.</p><p>The software ocr application will process images that get loaded into the software program by the user of the computer system. It will display the scanned document in the left window and the text that has been processed by the OCR document scanning in the right window. The software developers of the OCR software are recommending at least a 3 Megapixel camera to take the image of the document. They have setup a <a
href="http://www.topocr.com/mtutorial.html">tutorial</a> page with many helpful tips on getting the best results.</p><p><span
id="more-11144"></span><img
src="http://www.ghacks.net/wp-content/uploads/2009/03/ocr_document_scanning-500x233.jpg" alt="ocr document scanning" title="ocr document scanning" width="500" height="233" class="alignnone size-medium wp-image-11146" /></p><p>The OCR document scanning process itself takes only a few seconds per page. Results are instantly shown in the right window which offers basic text editing capabilities to correct any errors in the automatic text recognition process.</p><p>The OCR part of the software program is providing basic image manipulation features like rotating the scanned images or changing the image contrast. TOP OCR is a multi-lingual OCR software program for the Windows operating system that produces impressive results if the source image is of good quality.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2009/03/12/ocr-document-scanning-with-smartphones/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Capture image and convert it to text</title><link>http://www.ghacks.net/2007/03/12/capture-image-and-convert-it-to-text/</link> <comments>http://www.ghacks.net/2007/03/12/capture-image-and-convert-it-to-text/#comments</comments> <pubDate>Mon, 12 Mar 2007 04:42:43 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Operating Systems]]></category> <category><![CDATA[Tools]]></category> <category><![CDATA[Windows]]></category> <category><![CDATA[character-recognition]]></category> <category><![CDATA[freeware]]></category> <category><![CDATA[image-capture]]></category> <category><![CDATA[image-ocr]]></category> <category><![CDATA[jocr]]></category> <category><![CDATA[ocr]]></category> <category><![CDATA[scan-image]]></category> <category><![CDATA[scan-text]]></category> <guid
isPermaLink="false">http://www.ghacks.net/2007/03/12/capture-image-and-convert-it-to-text/</guid> <description><![CDATA[I was not really sure how to name the title of this article. Jocr is a freeware that makes it possible to capture to capture a set region, a window or a full screen image in Windows and use character recognition to write the text of the image into a notepad file. The only prerequisite as far as I can tell is - unfortunately - a copy of Microsoft Office 2003 or newer with Microsoft Office Document Imaging installed which you can find under the Office Tools tab of the installation CD. The language that you are using has to be supported by Microsoft Office Document Imaging, about 20 are supported next to English of course.]]></description> <content:encoded><![CDATA[<p>I was not really sure how to name the title of this article. Jocr is a freeware that makes it possible to capture to capture a set region, a window or a full screen image in Windows and use character recognition to write the text of the image into a notepad file. The only prerequisite as far as I can tell is &#8211; unfortunately &#8211; a copy of Microsoft Office 2003 or newer with Microsoft Office Document Imaging installed which you can find under the Office Tools tab of the installation CD. The language that you are using has to be supported by Microsoft Office Document Imaging, about 20 are supported next to English of course.</p><p><img
src="http://www.ghacks.net/files/screens/2007/03/jocr.jpg" align="left" alt="jocr character recognition" />Using <a
href="http://home.megapass.co.kr/~woosjung/Index_Download.html" target="_blank">Jocr</a> is actually a pretty straightforward process. Choose if you want to capture a region, window or desktop and use the mouse to draw the region or to highlight a window that should be captured. It is not necessary to select something if you choose to capture the whole desktop of course. A preview of the captured image will be shown in Jocr and all that is left to be done is to click on recognize to start the character recognition.</p><p><span
id="more-1293"></span></p><p>Results are actually pretty good. They are not perfect however and you need to manually edit them for errors but the recognition rate is astounding for a freeware. The main use that I can foresee for this software would be to capture text from files that can not be copied and use Jocr to have an editable version of the text.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2007/03/12/capture-image-and-convert-it-to-text/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> </channel> </rss>
