Google Bot has privileges, or, how to browse the Internet as Google's Bot

Martin Brinkmann
Aug 15, 2007
Updated • Nov 8, 2017
Internet
|
9

Last year I reviewed a method to load all content on the Experts-Exchange website by disguising the browser as Googlebot. Or more precisely, your browser's user agent header.

The site blocked unregistered users from accessing content on the site, but allowed googlebot to access the content.

Apparently a similar story makes its way around the Internet these days with a more detailed approach detailing the steps that you have to partake to be identified as Googlebot.

It is not enough to simply change the User-Agent string to Googlebot if the website in question checks for cookies, uses Javascript for detection, or compares the IP to make sure it is indeed in Google's IP range.

Modifying just the User-Agent might work to gain access to some websites, but others probably will not work because they perform additional checks.

Google Bot user agent

Here are the five factors that are important:

  • IP: Use Google Translate to surf the site. You can alternatively use a web proxy or regular proxy, use the anonymizer Tor or a virtual private network for the same effect.
  • User-Agent: Use the Firefox Extension User-Agent Switcher and add the information about Googlebot.
  • Javascript: Use a Firefox Extension like No Script to turn it off on sites you visit (or more precisely, stop any JavaScript program from running automatically)
  • Cookies: Use the Firefox Extension Cookie Safe to block cookies that the site tries to set.
  • Referrer: Use the Firefox Extension RefControl to disable the Referrer.

Keep in mind that it may be sufficient to use some of the options and not all of them. Depending on the website, you may only need to change your user agent or IP to access the contents. The only thing you can do to find out is to test it using various setups.

The website describing the techniques is currently down because it was not able to handle the massive amount of visitors that Digg and other sites sent to it.

Update: The website is up again and you find all relevant information on it again.

Update 2: The website is down again and it is unlikely that it will come back up again. I have removed the link, but the information above should be enough to get you started.

The one thing that you need to do at all times is to set the user agent of your browser to Googlebot. If that is not enough, you may need to make use pf (some of) the other four factors outlined above to get it to work.

Summary
Google Bot has privileges, or, how to browse the Internet as Google's Bot
Article Name
Google Bot has privileges, or, how to browse the Internet as Google's Bot
Description
If changing the user agent to Googlebot does not work to access a site's content that is blocked, you may need to make these additional adjustments.
Author
Publisher
Ghacks Technology News
Logo
Advertisement

Tutorials & Tips


Previous Post: «
Next Post: «

Comments

  1. ilev said on August 4, 2012 at 7:53 pm
    Reply

    Doesn’t Windows 8 know that www. or http:// are passe ?

    1. Martin Brinkmann said on August 4, 2012 at 7:57 pm
      Reply

      Well it is a bit difficulty to distinguish between name.com domains and files for instance.

    2. Leonidas Burton said on September 4, 2023 at 4:51 am
      Reply

      I know a service made by google that is similar to Google bookmarks.
      http://www.google.com/saved

  2. VioletMoon said on August 16, 2023 at 5:26 pm
    Reply

    @Ashwin–Thankful you delighted my comment; who knows how many “gamers” would have disagreed!

  3. Karl said on August 17, 2023 at 10:36 pm
    Reply

    @Martin

    The comments section under this very article (3 comments) is identical to the comments section found under the following article:
    https://www.ghacks.net/2023/08/15/netflix-is-testing-game-streaming-on-tvs-and-computers/

    Not sure what the issue is, but have seen this issue under some other articles recently but did not report it back then.

  4. Anonymous said on August 25, 2023 at 11:44 am
    Reply

    Omg a badge!!!
    Some tangible reward lmao.

    It sucks that redditors are going to love the fuck out of it too.

  5. Scroogled said on August 25, 2023 at 10:57 pm
    Reply

    With the cloud, there is no such thing as unlimited storage or privacy. Stop relying on these tech scums. Purchase your own hardware and develop your own solutions.

    1. lollmaoeven said on August 27, 2023 at 6:24 am
      Reply

      This is a certified reddit cringe moment. Hilarious how the article’s author tries to dress it up like it’s anything more than a png for doing the reddit corporation’s moderation work for free (or for bribes from companies and political groups)

  6. El Duderino said on August 25, 2023 at 11:14 pm
    Reply

    Almost al unlmited services have a real limit.

    And this comment is written on the dropbox article from August 25, 2023.

  7. John G. said on August 26, 2023 at 1:29 am
    Reply

    First comment > @ilev said on August 4, 2012 at 7:53 pm

    For the God’s sake, fix the comments soon please! :[

  8. Kalmly said on August 26, 2023 at 4:42 pm
    Reply

    Yes. Please. Fix the comments.

  9. Kim Schmidt said on September 3, 2023 at 3:42 pm
    Reply

    With Google Chrome, it’s only been 1,500 for some time now.

    Anyone who wants to force me in such a way into buying something that I can get elsewhere for free will certainly never see a single dime from my side. I don’t even know how stupid their marketing department is to impose these limits on users instead of offering a valuable product to the paying faction. But they don’t. Even if you pay, you get something that is also available for free elsewhere.

    The algorithm has also become less and less savvy in terms of e.g. English/German translations. It used to be that the bot could sort of sense what you were trying to say and put it into different colloquialisms, which was even fun because it was like, “I know what you’re trying to say here, how about…” Now it’s in parts too stupid to translate the simplest sentences correctly, and the suggestions it makes are at times as moronic as those made by Google Translations.

    If this is a deep-learning AI that learns from users’ translations and the phrases they choose most often – which, by the way, is a valuable, moneys worthwhile contribution of every free user to this project: They invest their time and texts, thereby providing the necessary data for the AI to do the thing as nicely as they brag about it in the first place – alas, the more unprofessional users discovered the translator, the worse the language of this deep-learning bot has become, the greater the aggregate of linguistically illiterate users has become, and the worse the language of this deep-learning bot has become, as it now learns the drivel of every Tom, Dick and Harry out there, which is why I now get their Mickey Mouse language as suggestions: the inane language of people who can barely spell the alphabet, it seems.

    And as a thank you for our time and effort in helping them and their AI learn, they’ve lowered the limit from what was once 5,000 to now 1,500…? A big “fuck off” from here for that! Not a brass farthing from me for this attitude and behaviour, not in a hundred years.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.