PDF Masher, Turn PDF Documents Into HTML Documents - gHacks Tech News

PDF Masher, Turn PDF Documents Into HTML Documents

PDF Masher has been designed for users who read ebooks on their mobile devices. PDF is not the best format for that purpose, considering that it is not possible to change a document's font size for instance. While it is possible to use the device's zoom function to read the document, it is usually not a very comfortable option, especially for large documents.

The HTML format offers an alternative. While it is often not that pretty to look at it provides better controls to read and work with the text of a document. Tools like Calibre can convert pdf documents into various formats. Their disadvantage is that they often do not get it completely right, so that header, footer and other textual information are added that are not really needed to read the text.

Enter PDF Masher. The Open Source software turns pdf documents into HTML pages. Instead of relying on guesswork or an algorithm to extract text from the pdf document, it asks the user to identify and select the text that should be available in the next document.

pdf-masher

You can load a pdf document via the Open File button at the top of the interface. PDF Masher scans the document and displays all the text that it found in a table like structure.

Displayed in the sortable table are the font size, x and y position, text length and the text itself among other data. This makes identification of text that you want included in the resulting document relatively easy. A click on a row display that row's text in the lower half of the screen. Here it is possible to add, edit or delete text directly. That's helpful if the automatic text detection created some mistakes that need to be corrected.

It is furthermore possible to ignore single or multiple text ids automatically so that they do not turn up in the new document.

Lines can also be set as footnotes and titles. Footnotes are for instance automatically added to the last page of the document, so that they do not appear in the document.

The developer has created a small video that demonstrates the programs functionality.

PDF Masher is a handy program for users who want better control and readability on their mobile devices. The manual conversion options may take longer than automatic conversions, but they ensure that the accessibility of the document is improved.

Users who want to convert multiple documents at once need to look at other programs for the job. If it is just one document, then PDF Masher is one the best options, provided that you are fine with the resulting HTML format.

PDF Masher is available for Mac OXS, Linux and windows operating systems. It can be downloaded from the developer website.





  • We need your help

    Advertising revenue is falling fast across the Internet, and independently-run sites like Ghacks are hit hardest by it. The advertising model in its current form is coming to an end, and we have to find other ways to continue operating this site.

    We are committed to keeping our content free and independent, which means no paywalls, no sponsored posts, no annoying ad formats (video ads) or subscription fees.

    If you like our content, and would like to help, please consider making a contribution:

    Comments

    1. computerfella said on July 13, 2011 at 12:25 am
      Reply

      You should have mentioned you are limited to 40+ hours of use before it quits working. I had downloading stuff only to find out it is begware.

    2. nad rosenberg said on July 27, 2011 at 5:22 pm
      Reply

      How does PDF Masher handle bullets? How about tables?

    Leave a Reply