Rip Websites with HTTrack Website Copier
HTTRack is a free porgram for Windows and various Unix based operating systems that you can use to copy contents of a website to your local system. It enables you to download all pages and information of a website to the local system. This is commonly referred to as ripping websites. While that's its greatest strength, you can use it as well to quickly download onlyÂ a single page of a website, or a category instead.
It is a program for experienced users and those who do not mind spending time exploring all the options that it makes available. It offers many configuration options which may look intimidating at first, but if you get through that initial phase, you will certainly appreciate what it has to offer. Most of the settings are optional and help you deal with special case websites that use lots of scripting, dynamically generated pages or require authentication.
A question that may come up is why you would want to rip a website to your local system? There are many reasons for this. Maybe you want to make the contents of the site available for offline browsing. This can be useful if a PC you need the information on has no Internet connection, or at least no permanent connection. It can also be useful if you know or fear that a site may be taken offline in the near future. The program can help you preserve the information by downloading them all to your system.You can last but not least use it to create a local backup of your own site, even though there are usually other options available in this regard.Saving websites to the local computer
- Once you have installed the application run it and click next on the first screen.
- Name your project and assign a category to it (optional). I recommend you use the website's name here.
- The base path is the location where the website will be stored in. Make sure you have enough webspace available on the drive. Click next afterwards.
- You can now enter one or multiple web addresses in a form that you want to process. You can alternatively load a text file that contains a list of urls into the program.
- The action defines what you want the program to do with the urls. The default action is to download websites, but you can change it to update an existing download, test links on the site and a variety of others. Usually, download web site(s) is the right choice here.
- Click on set options to define preferences. This is important and should not be skipped.
- Important preference tabs are limits, which you use to define the maximum mirroring depth (based on links that the program will follow), and scan rules which you can use to include or exclude select links or data types.
- I recommend that you go through the other tabs here as well to get a basic understanding of the program's functionality. Most can be kept at their default levels though.
- You can adjust connection parameters on the next page. Here you can for instance select to shutdown the PC when finished, or disconnect the Internet connection.
The HTTrack website offers a step by step guide that you can use to get to know the program and the core of its features. This should suffice to rip your first website. HTTrack is available for Windows and Unix, Linux & BSD.
The best way to get started with HTTrack is to check out the manual posted on the site which walks you through copying your first website with the help of the program. You can also check out our tutorials on the subject, e.g. how to save websites to your hard drive or how to rip most websites.
- The program generates a log file whenever it runs an operation. Use it to find errors and issues and adjust the project accordingly.
- You can download 32-bit or 64-bit versions for Windows from the developer website. The program is also being made available as a portable version. Make sure you run WinHTTrack.
- The program supports the https (SSL) protocol.
- The "get files near links" option enables you to download files hosted on third party websites without configuring the program to crawl those third party sites as well.