Download web pages to your local computer with WebCopy
Sometimes you may want to download a website, or part of it, to your local system. Maybe you want to make use of the contents while you are offline, or for safekeeping reasons so that you can access the contents even if the website becomes temporarily or permanently unavailable.
My favorite tool for the job is Httrack. It is free and ships with an impressive amount of features. While that is great if you spend some time getting used to what the program has to offer, you sometimes may want a faster solution that you do not have to configure extensively before use.
That's where WebCopy comes into play. It is a sophisticated program as well which you find out when you dig deeper into the application's settings, but if you want to copy a web page fast to your local system you can do so right away ignoring the advanced configuration options.
- Paste or enter a web address into the website field in WebCopy.
- Make sure the save folder is correct.
- Click on copy website to start the download.
That's all there is to it. The program processes the selected page for you echoing the progress in the results tab in the interface. Here you see downloaded and skipped files, as well as errors that may prevent the download altogether. The error message may help you analyze why a particular page or file cannot be downloaded. Most of the time though, you can't really do anything about it.
You can access the locally stored copies with a click on the open local folder button, or by navigating to the save folder manually.
This basic option only gets you this far, as you can only copy a single web page this way. You need to define rules if you want to download additional pages or even the entire website. Rules may also help you when you encounter broken pages that cannot be copied as you can exclude them from the download so that the remaining pages get downloaded to the local system.
To add rules right-click on the rules listing in the main interface and select add from the options. Rules are patterns that are matched against the website structure. To exclude a particular directory from being crawled, you'd simply add it as a pattern and select the exclude option in the rules configuration menu.
It is still not as intuitive as HTTracks link depth parameter that you can use to define the depth of the crawl and download.
WebCopy supports authentication which you can add in the forms and password settings. Here you can add a web address that requires authentication, and a username and password that you want the web crawler to use to access the contents.
- The website diagram menu displays the structure of the active website to you. You can use it to add rules to the crawler.
- You can add additional urls that you want included in the download under Project Properties > Additional URLs. This can be useful if the crawler cannot discover the urls automatically.
- The default user agent can be changed in the options. While that is usually not necessary, you may encounter some servers that block it so that you need to modify it to download the website.
The program is ideal for downloading single web pages to the local system. The rules system is on the other hand not that comfortable to use if you want to download multiple pages from a website. I'd prefer an option in the settings to simply select a link depths that I want the program to crawl and be done with it. (via Make Tech Easier)Advertisement