Ripping websites means to create a local copy of a website for offline browsing purposes. Creating a website mirror can actually be a good idea for several purposes. Even with all those caches that save website information many get lost when a website goes into Nirvana. It’s also nice if you need information on a computer with no Internet access, or only temporary Internet access, say an HTML course for example.
One of the most efficient ways to rip websites is by using the program HTTrack which might look a little bit confusing at the beginning because of its many options. I would like to walk you through the process of ripping a website. Please note that this method is not working on all websites but on most.
To begin with you need to download and install the software HTTrack Website Copier. Start it once it has been installed, you will be greeted with a new project dialog. Each project creates the offline copy of one or more urls.

The first screen manages the properties of the project. Just add a name – i prefer the name of the website that I want to rip – and a location on your hard drive where you want to save it. Make sure you have enough free disk space on that hard drive. Click Next to continue.
You add urls and the kind of action that you want HTTrack to perform. The standard action will download an exact copy of the website and make it available offline. The most important aspect here is the Set Options button which opens the configuration for the project.
It is very important to load the options and make some changes there. Click on the Browser ID tab and change the ID to Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0). Some websites check for the default ID of Httrack and deny access to it. This way makes it possible to prevent that from happening.
Access the Limits tab afterwards. Select the maximum mirroring and external depths. The first defines how many links will be scanned beginning from the homepage. If you set that to 2 for instance the homepage will be scanned, page 1 which was linked from the homepage will be scanned and page 11 will be scanned which was linked from page 1.
If you leave the first option blank all links will be scanned on that website. No external links are scanned by default which can be changed in that menu as well. I suggest to leave it at that because it would really bloat the project. Make sure you increase the maximum transfer rate in the same menu to the maximum as well to ensure faster downloads.
The Scan Rules tab is another important one. You can include and exclude files in here. If you do not want to download .exe files for instance you can use the string “-*.exe” without the “” in the form.
Passworded Websites:
Passworded websites are most of the times harder to come by. You need to supply HTTrack with the username and password for that website. The easiest way to do so is to add it to the url in the main menu. Instead of adding the url http://www.example.com/ you would add it this way: http://username:password@www.example.com/
That’s for websites with basic authentication which means popups that ask for a username and password. It’s more difficulty if the website uses form based logins. Your best option to rip those websites is to click on the Add Url button in the main menu and use the capture url feature.
This requires you to set a proxy in your favorite browser for a short time and login into the website that you want to rip so that HTTrack can check the way it is done and hopefully emulate this way when ripping the website.
Read Related Posts
10 Responses to “How to rip most websites”
Trackbacks/Pingbacks
-
[...] Open Image is a good alternative for downloading pictures from single web pages that contain many pictures. A good alternative for download pictures from multiple pages is the website ripper HTTrack. [...]
-
[...] que contêm muitas imagens. Uma boa alternativa para baixar imagens de várias páginas é o HTTrack. Fonte: Ghacks Artigos Relacionados:Como baixar sons, imagens e vídeos grátis para seu site? [...]

Rip Websites with HTTrack Website Copier
Create unique secure passwords for websites
Create A Cached Website Copy
One Click DVD Ripping
Use one password on all websites
Get Notified If Other Websites Use Your Articles
. . . or just use the Firefox Scrapbook extension.
or just use wget / curl with the good parameters
Dan I only looked briefly at Scrapbook, are you sure you can save all pages of a URL with it automatically ? It looks more like save one page at a time ?
Gokudomatic can you point me to a good tutorial that explains how this is done ? That would be awesome, thanks.
With scrapbook it is not possible to save pages you have not loaded. But you can save all the open tabs tough.
In ScrapBook, you can also set the depth of links to follow and it will save all relevant pages. Not the same thing, to be sure, but setting an adequate depth would insure capturing all child pages x number deep.
I’ve relied on ScrapBook for years now to store web pages I want to retain offline. Great organizational and search features. It’s a truly useful extension that’s worth investigating. (I have no affiliation with the creator of this extension; just a fan!)
Dan
Martin: I completely agree with Dan on Scrapbook. I am also using Scrapbook for about 1.5 years now. You can set the depth of capturing that has some what similar effect. Actually it can do far more then just ripping a website.
Initially it may take some time to get used to after some time you will really don’t want to leave it. It also supports bookmarking tagging(3rd party extension), webpage comments. In fact many standalone application have been built to mimic features after scrapbook came out. Metaproducts Inquiry is one example.
In fact I love it so much I have written my first ever review on sracpbook :)
I am in love with HTTrack too. But some sites like Wikipedia is smart enough to block this kind of website downloaders with their own robots.txt
This has to be the best website ripper ever. And it is free. This is helped me find out how people hid certain scripts to fudge traffic exchange.