Read articles behind paywalls by masquerading as Googlebot
The Internet is at a tipping point. The continued rise of adblocking has put an end to the revenue model that relies solely on ad dollars to operate websites and businesses.
Especially news sites have started to experiment with ways to diversify income sources, and one prominent option that sites like The Wall Street Journal, Financial Times, The New York Times, the Times, or The Washington Post have implemented or tested is the paywall system.
There are different types of paywalls but they all have in common that they block access to content; this may happen directly when the first article is opened, after a certain number of articles have been read on site, or as an excerpt system that displays the first paragraph to the reader and below that sign-up information to read the rest.
Paywalls may not always require users to pay money for access. Some sites may require users to sign-up to use the site but won't charger users once they have signed up.
It may make sense from a business point of view, and may be more lucrative than battling it out with users who run adblockers, but there is a downside to it both for the paywalled site and the blocked user.
Sites lose a high percentage of visitors if they implement a paywall system. It is unclear how high the percentage really is, and it probably varies from site to site, but it is likely a lot higher than the percentage of visitors who subscribe to the site after being presented with the choice to subscribe to read the desired article.
For users, it can be really frustrating to follow a link to an interesting sounding article just to be blocked from reading it once the resource has loaded; it is a waste of time for many, especially if no content is provided prior to signing up or subscribing.
Masquerade your browser
It is no secret that news sites allow access to news aggregators and search engines. If you check Google News or Search for instance, you will find articles from sites with paywalls listed there.
In the past, news sites allowed access to visitors coming from major news aggregators such as Reddit, Digg or Slashdot, but that practice seems to be as good as dead nowadays. Some may still allow it but it is trial and error, and the workaround may be shut down at any time.
Another trick, to paste the article title into a search engine to read the cached story on it directly, does not seem to work properly anymore as well as articles on sites with paywalls are not usually cached anymore.
Tip: check out the following add-on that you may use to bypass paywalls:
User-Agent and Referrer
You are probably wondering how sites block or allow access to the site's content. The methods have have improved over the years, and it is no longer enough to simply change the referrer of the browser to https://www.google.com/ to gain full access to a site's content.
Instead, sites use various checks that include user-agent, referrer and cookies, and sometimes even more than that, to determine the legitimacy of access.
Probably the best way to masquerade the browser is to make it appear to be Googlebot.
- Referrer:Â https://www.google.com/
- User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html
Note that the option does not work anymore on may sites. It may be better to try and masquerade as coming from Twitter or other social media sites.
Firefox users need two browser add-ons for that: the first, RefControl,Â to change the referrer value when visiting news sites, the second, User Agent Switcher, to change the user agent of the browser.
Update: RefControl is no longer available. You may try this alternative instead. End
- Download and install both extensions in the Firefox web browser.
- Tap on the Alt-key, and select Tools > RefControl Options.
- Click on "add site", enter a domain name under site, select custom action, and enter https://www.google.com/ as the referrer.
- Repeat this for all news sites you want to access (some may not work even if you make the changes, so keep that in mind).
- When you are done, close the configuration window.
- Tap on the Alt-key again, and select Tools > Default User Agent > Edit User Agents from the menu.
- Select New > User Agent, and replace the string in the User Agent field with Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html). Name it Googlebot.
- Exit the menu.
- Before you access these sites, tap on Alt, and select Default User Agent > Googlebot.
This is all there is to it. It is a bit unfortunate that there is no extension for Firefox that changes the user agent automatically based on the sites you visit.
There is however another possibility, and that is to create a custom extension which automates the process in the browser.
Instructions are provided on Elaineou. All it takes, basically, is to create a new directory on the local computer, create the two files background.js and manifest.json inside it, and copy and paste the code found on the site into the files.
You need to enable "developer mode" on chrome://extensions/, and can then select "load unpacked extension" to pick the folder you have created the two files in to load the extension in Chrome.
You may modify the list of sites it supports to add new ones.Advertisement