Convert HTML files to Plain Text properly

Martin Brinkmann
Dec 26, 2012
Software
|
4

There are a couple of reasons why you'd want to convert local or online HTML files to the plain text format (.txt). Maybe you want to move the files to a device that can't read or display HTML files properly, or maybe, you'd like to turn multiple HTML documents into a single text document for easier archiving, or, you just need the textual information from the documents to use them for work.

While you can now go ahead and use copy and paste to do that, or go through the source code manually, you may quickly realize that it takes some time to do so. Going through the source code is usually not the best option as you may end up copying HTML tags to the new document which are not interpreted in the plain txt file. Depending on the HTML files structure, you may also have issues copying its textual contents when you view it in a browser.

Nirsoft's HTMLasText comes to the rescue as it provides you with an automated way of converting HTML files to plain text. The program has been designed to work with single and multiple HTML files as long as the documents are stored in a single folder or folder structure on your hard drive. You can use wildcards to select the HTML files on your drive and wildcards for the corresponding txt files as well.

You simply select the HTML root folder and define whether you want to convert a single file or multiple files using wildcards. If you have HTML documents in a subfolder select the scan subfolder option here as well.

convert html to text

The conversion options define several output parameters. Here you can select the maximum number of characters per line and which characters you want used as a representation of unordered lists. HTMLAsText not only extracts the text from HTML documents but preserves part of the document formatting as well.

Additional formatting related options are available to highlight heading tags (h1 to h6) by using underlines, skip the title tag, enclose bold text with characters you select and to allow centered or right-aligned text as well.

You can save the configuration to load it at anytime in the future which may be useful if you need to convert HTML documents to text regularly.The conversion itself does not take longer than a second for a single document, and the quality of the output is quite good. While you may still need to manually edit the text document, for instance by removing navigational elements or menus that you do not need, the program's formatting preservation helps to limit that to a fraction of the time you'd normally spend doing so.

Advertisement

Tutorials & Tips


Previous Post: «
Next Post: «

Comments

  1. Jim said on December 27, 2012 at 11:36 am
    Reply

    @tom
    I’ve been using a program called pearl mountain image converter which will convert lyr to jpg etc (i don’t use it for this but for other reason anyway i’ll give you my key as there’s a watermark on trial as you seem to need to convert the lyr bad,

    download from here: http://www.pearlmountainsoft.com/pearlmountain-image-converter/index.html

    my serial http://pastebin.com/r9kNw4G1
    cheers

  2. jmjsquared said on December 26, 2012 at 9:26 pm
    Reply

    @tom – Give this a try: ArcGIS Explorer Desktop. It’s part of a mapping suite of software, is free and opens MXD LYR 3DD files.

    http://www.esri.com/software/arcgis/explorer/download

  3. Shawn said on December 26, 2012 at 3:47 pm
    Reply

    @tom…

    The lyr files are from what based program? as .lyr can be lyrics, DataCad, GPS imagery files, and even in the medical field

    Knowing the source of the file will help the rest of us help you out…

  4. tom said on December 26, 2012 at 12:59 pm
    Reply

    Martin.

    merry christmas

    Recenty I cam to a case were I needed to covert .lyr to JPG however no free sofrware available? Do you have any such thinks in your memory?

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.