PDF Masher, Turn PDF Documents Into HTML Documents - gHacks Tech News

PDF Masher, Turn PDF Documents Into HTML Documents

PDF Masher has been designed for users who read ebooks on their mobile devices. PDF is not the best format for that purpose, considering that it is not possible to change a document's font size for instance. While it is possible to use the device's zoom function to read the document, it is usually not a very comfortable option, especially for large documents.

The HTML format offers an alternative. While it is often not that pretty to look at it provides better controls to read and work with the text of a document. Tools like Calibre can convert pdf documents into various formats. Their disadvantage is that they often do not get it completely right, so that header, footer and other textual information are added that are not really needed to read the text.

Enter PDF Masher. The Open Source software turns pdf documents into HTML pages. Instead of relying on guesswork or an algorithm to extract text from the pdf document, it asks the user to identify and select the text that should be available in the next document.

pdf-masher

You can load a pdf document via the Open File button at the top of the interface. PDF Masher scans the document and displays all the text that it found in a table like structure.

Displayed in the sortable table are the font size, x and y position, text length and the text itself among other data. This makes identification of text that you want included in the resulting document relatively easy. A click on a row display that row's text in the lower half of the screen. Here it is possible to add, edit or delete text directly. That's helpful if the automatic text detection created some mistakes that need to be corrected.

It is furthermore possible to ignore single or multiple text ids automatically so that they do not turn up in the new document.

Lines can also be set as footnotes and titles. Footnotes are for instance automatically added to the last page of the document, so that they do not appear in the document.

The developer has created a small video that demonstrates the programs functionality.

PDF Masher is a handy program for users who want better control and readability on their mobile devices. The manual conversion options may take longer than automatic conversions, but they ensure that the accessibility of the document is improved.

Users who want to convert multiple documents at once need to look at other programs for the job. If it is just one document, then PDF Masher is one the best options, provided that you are fine with the resulting HTML format.

PDF Masher is available for Mac OXS, Linux and windows operating systems. It can be downloaded from the developer website.

Advertisement

We need your help

Advertising revenue is falling fast across the Internet, and independently-run sites like Ghacks are hit hardest by it. The advertising model in its current form is coming to an end, and we have to find other ways to continue operating this site.

We are committed to keeping our content free and independent, which means no paywalls, no sponsored posts, no annoying ad formats or subscription fees.

If you like our content, and would like to help, please consider making a contribution:


Previous Post: «
Next Post: »

Comments

  1. computerfella said on July 13, 2011 at 12:25 am
    Reply

    You should have mentioned you are limited to 40+ hours of use before it quits working. I had downloading stuff only to find out it is begware.

  2. nad rosenberg said on July 27, 2011 at 5:22 pm
    Reply

    How does PDF Masher handle bullets? How about tables?

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

Please note that your comment may not appear immediately after you post it.