Last year I reviewed a method to load all content on the Experts-Exchange website by disguising the browser as Googlebot. Or more precisely, your browser's user agent header.
The site blocked unregistered users from accessing content on the site, but allowed googlebot to access the content.
Apparently a similar story makes its way around the Internet these days with a more detailed approach detailing the steps that you have to partake to be identified as Googlebot.
Modifying just the User-Agent might work to gain access to some websites, but others probably will not work because they perform additional checks.
Here are the five factors that are important:
Keep in mind that it may be sufficient to use some of the options and not all of them. Depending on the website, you may only need to change your user agent or IP to access the contents. The only thing you can do to find out is to test it using various setups.
The website describing the techniques is currently down because it was not able to handle the massive amount of visitors that Digg and other sites sent to it.
Update: The website is up again and you find all relevant information on it again.
Update 2: The website is down again and it is unlikely that it will come back up again. I have removed the link, but the information above should be enough to get you started.
The one thing that you need to do at all times is to set the user agent of your browser to Googlebot. If that is not enough, you may need to make use pf (some of) the other four factors outlined above to get it to work.Advertisement
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.