There is a very high probability that individual Internet users can be identified by analyzing the browsing history alone, according to a new study by Mozilla.
Mozilla published the results of a recent study in the research paper "Replication: Why We Still Can't Browse in Peace: On the Uniqueness and Reidentifiability of Web Browsing Histories" [PDF link] which it presented at the USENIX security conference earlier this month.
Last year, Mozilla asked Firefox users to participate in an experiment to find out how effective the browsing history is in identifying users on the Internet. The collected data resembles the data that third-parties may collect through various tracking mechanisms on the Internet.
About 52,000 Firefox users agreed to participate in the study which ran over the course of two weeks. Users would share the browsing history in the first week and in the second, and Mozilla would analyze the data to find out if the first week data could be used to identify users based on the second week data.
The researchers managed to identify nearly 49,000 "distinct browsing profiles" and discovered that 99% were unique.
50% of users could be identified using the top 10,000 websites if the users visited at least 50 distinct websites in the period. If users visited 150 or more sites, the probability of identification increased to 80% using the top 10,000 websites as the data pool.
The data confirms a study from 2012 which used a different way of gathering the data. Back then, researchers set up a test site and used CSS code to identify sites from a 6000 domains list to find out which of these sites users had visited. The 2012 study concluded that 97% of visitors had a unique list of sites based on the 6000 domains list, and that the data alone could be used to track users across the web.
Mozilla's data was more accurate because it received the entire browsing history of users who participate in the studies.
The study confirms that third-parties may use the browsing history to create user profiles and track users across the Internet provided that they manage to gain access to a large portion of a user's browsing history. Facebook and Alphabet, Google's parent company, observe large portions of the web based on the analysis of third-party scripts in the browsing data. Alphabet access (Google) was found on 9823 of the top 10,000 websites, Facebook access on 7348 sites. "Numerous companies" with access in the 2000-5000 range of the top 10,000 sites were also detected.
The researchers recommend that users enable privacy protections in their browser of choice to reduce the tracking capabilities of these companies. Disabling or limiting third-party cookies, using Containers ( a unique Firefox feature), modifying default privacy settings, deleting data regularly, and installing privacy extensions may limit a company's ability to identify a user based on the browsing profile but these methods may not eliminate the thread entirely.
Now You: Do you protect your privacy online?Advertisement
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.