Rip Websites with HTTrack Website Copier - gHacks Tech News

Rip Websites with HTTrack Website Copier

HTTRack is a free porgram for Windows and various Unix based operating systems that you can use to copy contents of a website to your local system. It enables you to download all pages and information of a website to the local system. This is commonly referred to as ripping websites. While that's its greatest strength, you can use it as well to quickly download only  a single page of a website, or a category instead.

It is a program for experienced users and those who do not mind spending time exploring all the options that it makes available. It offers many configuration options which may look intimidating at first, but if you get through that initial phase, you will certainly appreciate what it has to offer. Most of the settings are optional and help you deal with special case websites that use lots of scripting, dynamically generated pages or require authentication.
A question that may come up is why you would want to rip a website to your local system? There are many reasons for this. Maybe you want to make the contents of the site available for offline browsing. This can be useful if a PC you need the information on has no Internet connection, or at least no permanent connection. It can also be useful if you know or fear that a site may be taken offline in the near future. The program can help you preserve the information by downloading them all to your system.You can last but not least use it to create a local backup of your own site, even though there are usually other options available in this regard.Saving websites to the local computer

httrack

  1. Once you have installed the application run it and click next on the first screen.
  2. Name your project and assign a category to it (optional). I recommend you use the website's name here.
  3. The base path is the location where the website will be stored in. Make sure you have enough webspace available on the drive. Click next afterwards.
  4. You can now enter one or multiple web addresses in a form that you want to process. You can alternatively load a text file that contains a list of urls into the program.
  5. The action defines what you want the program to do with the urls. The default action is to download websites, but you can change it to update an existing download, test links on the site and a variety of others. Usually, download web site(s) is the right choice here.
  6. Click on set options to define preferences. This is important and should not be skipped.
  7. Important preference tabs are limits, which you use to define the maximum mirroring depth (based on links that the program will follow), and scan rules which you can use to include or exclude select links or data types.
  8. I recommend that you go through the other tabs here as well to get a basic understanding of the program's functionality. Most can be kept at their default levels though.
  9. You can adjust connection parameters on the next page. Here you can for instance select to shutdown the PC when finished, or disconnect the Internet connection.

The HTTrack website offers a step by step guide that you can use to get to know the program and the core of its features. This should suffice to rip your first website. HTTrack is available for Windows and Unix, Linux & BSD.

The best way to get started with HTTrack is to check out the manual posted on the site which walks you through copying your first website with the help of the program. You can also check out our tutorials on the subject, e.g. how to save websites to your hard drive or how to rip most websites.

Tips

  1. The program generates a log file whenever it runs an operation. Use it to find errors and issues and adjust the project accordingly.
  2. You can download 32-bit or 64-bit versions for Windows from the developer website. The program is also being made available as a portable version. Make sure you run WinHTTrack.
  3. The program supports the https (SSL) protocol.
  4. The "get files near links" option enables you to download files hosted on third party websites without configuring the program to crawl those third party sites as well.

We need your help

Advertising revenue is falling fast across the Internet, and independently-run sites like Ghacks are hit hardest by it. The advertising model in its current form is coming to an end, and we have to find other ways to continue operating this site.

We are committed to keeping our content free and independent, which means no paywalls, no sponsored posts, no annoying ad formats or subscription fees.

If you like our content, and would like to help, please consider making a contribution:

Comments

  1. inforpic said on August 17, 2006 at 3:20 am
    Reply

    thanks a lot.
    sound like a good software.

  2. Felipe Tofani said on August 18, 2006 at 3:06 pm
    Reply

    This software is really useful.

  3. Justin2039 said on March 23, 2008 at 3:22 am
    Reply

    I have used HTTrack and found it great, on WDS’s Well Designed Sites (not necessarily pretty ones).
    Sites are either Integrated or Disintegrated or every shade between, depending upon the mind set of the designer. I can’t say I’ve found the answer, yet. However a side benefit of HTTrack is it can assist in the discovery of well designed sites (irrespective of the content).

  4. Brayan Habid said on February 28, 2009 at 6:23 am
    Reply

    What a great piece of software! Thanks a lot!

  5. shubhankar said on June 14, 2014 at 9:43 am
    Reply

    Does it work for those sites that require username and password

    1. Martin Brinkmann said on June 14, 2014 at 9:59 am
      Reply

      It supports authentication, yes.

  6. employee said on February 2, 2015 at 9:20 pm
    Reply

    android port available too.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

Please note that your comment may not appear immediately after you post it.