There are a couple of reasons why you'd want to convert local or online HTML files to the plain text format (.txt). Maybe you want to move the files to a device that can't read or display HTML files properly, or maybe, you'd like to turn multiple HTML documents into a single text document for easier archiving, or, you just need the textual information from the documents to use them for work.
While you can now go ahead and use copy and paste to do that, or go through the source code manually, you may quickly realize that it takes some time to do so. Going through the source code is usually not the best option as you may end up copying HTML tags to the new document which are not interpreted in the plain txt file. Depending on the HTML files structure, you may also have issues copying its textual contents when you view it in a browser.
Nirsoft's HTMLasText comes to the rescue as it provides you with an automated way of converting HTML files to plain text. The program has been designed to work with single and multiple HTML files as long as the documents are stored in a single folder or folder structure on your hard drive. You can use wildcards to select the HTML files on your drive and wildcards for the corresponding txt files as well.
You simply select the HTML root folder and define whether you want to convert a single file or multiple files using wildcards. If you have HTML documents in a subfolder select the scan subfolder option here as well.
The conversion options define several output parameters. Here you can select the maximum number of characters per line and which characters you want used as a representation of unordered lists. HTMLAsText not only extracts the text from HTML documents but preserves part of the document formatting as well.
Additional formatting related options are available to highlight heading tags (h1 to h6) by using underlines, skip the title tag, enclose bold text with characters you select and to allow centered or right-aligned text as well.
You can save the configuration to load it at anytime in the future which may be useful if you need to convert HTML documents to text regularly.The conversion itself does not take longer than a second for a single document, and the quality of the output is quite good. While you may still need to manually edit the text document, for instance by removing navigational elements or menus that you do not need, the program's formatting preservation helps to limit that to a fraction of the time you'd normally spend doing so.Advertisement
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.