Computer users are exposed to a variety of tracking technologies when they browse the Internet. From traditional third party tracking cookies to local storage, Flash cookies and fingerprinting.
Companies that develop browsers aim to reduce the tracking their users are exposed to on the Internet, for instance by implementing Do Not Track options or changing the way third party cookies are handled.
While that takes care of some forms of tracking, it does not touch others.
Fingerprinting became a topic back in 2010 when the EFF released an online tool to compute a browser's fingerprint. It was a first attempt to demonstrate that fingerprinting could indeed be used to track users on the Internet.
While it was common knowledge that fingerprinting was used, it was not really clear how popular it really was.
A recent study suggests that at least 1% of the top 10000 websites use fingerprinting techniques to track users. The researchers used the rankings provided by Alexa, an Amazon company, for their study.
All have in common that they extract data either directly during connection attempts or afterwards by parsing log files to identify unique data sets that can be associated to single Internet users.
It is for instance possible to retrieve the list of installed fonts, the screen size or the installed plugins from a user system.
The program the researchers used crawled the top 1 million websites according to Alexa to determine if common fingerprinting techniques were used by the sites.
While at least 1% of the top 10,000 sites have been found to use fingerprinting tracking, only 404 of the top 1 million sites according to Alexa were found to use fingerprinting.
It needs to be noted at this point in time that it is quite possible that the actual number is larger than that. First, the developers were not able to determine whether server-side fingerprinting tracking was used by a website. Second, there is no common fingerprinting standard, which means that it is possible that attempts were not detected correctly.
One interesting result is a list of fingerprinting providers that the researchers discovered.
The research paper lists detailed information about the methodology used to crawl the sites, counter-measures, and other information that you may find useful.
The script used to crawl the sites will be published in the future on the website linked above. This is also the location where the research paper can be downloaded as a pdf document.
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.