Trackers may collect data that you type even before hitting submit
Many websites come with web forms, for example, to sign-in to an account, create a new account, leave a public comment or contact the website owner. What most Internet users may not know is that data that is typed on sites may be collected by third-party trackers, even before the data is sent.
A research team from KU Leuven, Radboud University and University of Lausanne, analyzed the data collecting of third-party trackers on the top 100K global websites. Results have been published in the research paper Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission.
Leaked data included personal information, such as the user's email address, names, usernames, messages that were typed into forms and also passwords in 52 occasions. Most users are unaware that third-party scripts, which includes trackers, may collect these kind of information when they type on sites. Even when submitting content, most may expect it to be confidential and not leaked to third-parties. Browsers do not reveal the activity to the user; there is no indication that data is collected by third party scripts.
Results differ based on location
Data collecting differs depending on the user's location. The researchers evaluated the effect of user location by running the tests from locations in the European Union and United States.
The number of email leaks was 60% higher for the location in the United States than it was for the location in the European Union. In numbers, emails were leaked on 1844 sites when connecting to the top 100k websites from the European Union and on 2950 sites when connecting to the same set of sites from the United States.
The majority of sites, 94.4%, that leaked emails when connecting from the EU location did leak emails when connecting from the US as well.
Leakage when using mobile web browsers was slightly lower in both cases. 1745 sites leaked email addresses when using a mobile browser from a location in the European Union, and 2744 sites leaked email addresses from a location in the United States.
More than 60% of leaks were identical on desktop and mobile versions according to the research.
The mobile and desktop websites where emails are leaked to tracker domains overlap substantially but not completely.
One explanation for the difference is that mobile and desktop crawls did not took place at the same time but with a time difference of one month. Some trackers were found to be active on mobile or desktop sites only.
The researchers suggest that stricter privacy European privacy laws play a role in the difference. The GDPR, General Data Protection Regulation, applies when sites and services collect personal data. Organizations that process personal data are responsible for complying with the GDPR.
The researchers believe that email exfiltration by third parties "can breach at least three GDPR requirements".
First, if such exfiltration happens surreptitiously, it violates the transparency principle.
Second, if such exfiltration is used for purposes such as behavioral advertising, marketing and online tracking, it also breaches the purpose limitation principle.
Third, if the email exfiltration is used for behavioral advertising or online tracking, the GDPR typically requires the website visitor’s prior consent.
Only 7720 sites in the EU and 5391 sites in the US did display consent popups during connects; that's 7.7% of all EU sites and 5.4% of all US sites.
The researchers discovered that the number of sites with leaks decreased by 13% in the US and 0.05% in the EU when rejecting all data processing using consent popups. Most Internet users might expect a reduction by 100% when not giving consent, but this is apparently not the case. The low decrease in the EU is likely caused by the low number of websites with detected cookie popups and observed leaks.
Site categories, trackers and leaks
Sites were added to categories such as fashion/beauty, online shopping, games, public information and pornography by the researchers. Sites in all categories, with the exception of pornography, leaked email addresses according to the researchers.
Fashion/Beauty sites leaked data in 11.1% (EU) and 19.0% (US) of all cases, followed by Online shopping with 9.4% (EU) and 15.1% (US), General News with 6.6% (EU) and 10.2% (US), and Software/Hardware with 4.9% (EU) and Business with 6.1% (US).
Many sites embed third-party scripts, usually for advertising purposes or website services. These scripts may track users, for example, to generate profiles to increase advertising revenue.
The top sites that leaked email address information were different depending on the location. The top 3 sites for EU visitors were USA Today, Trello and The Independent. For US visitors, they were Issuu, Business Insider, and USA Today.
Further analysis of the trackers revealed that a small number of organizations was responsible for the bulk of form data leaking. Values were once again different depending on location.
The five organizations that operate the largest number of trackers on sites that leak form data were Taboola, Adobe, FullStory, Awin Inc. and Yandex in the European Union, and LiveRamp, Taboola, Bounce Exchange, Adobe and Awin in the United States.
Taboola was found on 327 sites when visiting from the EU, LiveRamp on 524 sites when visiting from the US.
Protection against third-parties that leak form data
Web browsers do not reveal to users if third-party scripts collect data that users input on sites, even before submitting. While most, with the notable exception of Google Chrome, include anti-tracking functionality, it appears that they are not suitable for protecting user data against this form of tracking.
The researchers ran a small test using Firefox and Safari to find out of the default anti-tracking functionality blocked data exfiltration on the sample. Both browsers failed to protect user data in the test.
Browsers with built-in ad-blocking functionality, such as Brave or Vivaldi, and ad-blocking extensions such as uBlock Origin, offer better protection against data leaking. Users on mobile devices may use browsers that support extensions or include ad-blocking functionality by default.
The researchers developed the browser extension LeakInspector. Designed to inform users about sniffing attacks and to block requests that contain personal information, LeakInspector protects users data while active.
The extension's source is available on GitHub. The developers could not submit the extension to the Chrome Web Store, as it requires access to features that are only available in Manifest 2. Google accepts Manifest 3 extensions only in its Chrome Web Store. A Firefox extension is being published on the Mozilla Add-ons store for Firefox.
Now You: what is your take on this?
Pornography sites don’t leak. Every other category needs to use an equally effective prophylactic.
Interesting, perhaps not a surprise considering other papers within the same context that have been posted here.
In the spririt of ‘Don’t bother…’ I have not understood if it can be a useful addition or a redundant extension for those who already use uBlock Origin and ‘strict mode’ on Firefox.
Looks handy Martin, Will you let me/us know when there is a Firefox addon developed?
I ask this because I could not find anything where I could sign it so I would receive an email when the add-on is developed.
Or did I miss it?
Third-party trackers, the calamity.
As an example I discovered a sites a few days ago and was stunned by the ‘heaviness” of the page.
I submitted the site to the ‘Markup Blacklight’ and, besides the analysis stating 31 ad-trackers and 61 third-party cookies, appeared the use of a session recorder :
31 Ad trackers found on this site.
61 Third-party cookies were found.
This website could be monitoring your keystrokes and mouse clicks.
Blacklight detected the use of a session recorder, which tracks user mouse movement, clicks, taps, scrolls, or even network activity. This data is compiled into videos and heat maps that website owners can watch to see how users interact with the site. Research has shown these practices can be insecure and make sensitive user data such as passwords and credit card information more vulnerable to leaks. This technique was used by fifteen percent of popular websites when we scanned them in September 2020.
Blacklight detected a script belonging to the company Hotjar Ltd doing this on this site
Some domains simply don’t give a damn of users’ right to privacy. I’ve blocked several domains for such reasons, not for a couple of ad trackers and/or 3rd-party cookies, but when it comes to an abomination such as 31 ad trackers + 61 third-party cookies + a session recorder … I simply boycott, period. Not that i’m concerned given uBlock Origin & system-wide blacklists prevent such trackers from being called in the first place, but simply because worse than bad is unacceptable. A shame.
I’ll have a look at the Firefox extension being published on the Mozilla Add-ons store for Firefox but it’d be as a quick information tool and personally I’m not fond of spending time investigating the chaotic turpitude of the Web (no more than that of life), preferring a set & forget approach. But the set & forget sometimes meets such an inconceivable amount of trash that you just can’t just forget it. I’ll auto-censor my opinion concerning such places.
Only by refusing third-party cookies you cannot navigate on ‘Lancet’ for example, not to mention all others sites where you have to meticulously configure uBlock, hide various popup and whatever else.
@Shiva, hi there :=)
If my approach concerning tracking was only pragmatic, given all is blocked here, I’d just pass my way.
Point is years haven’t changed my revolt when it comes to flagrant misbehavior. Not political, factual.
So I always consider two factors when commenting : my experience and what I can conceive the experience to be for disarmed users.
As for blocking 3rd-party cookies, for from an advice which I give only when asked for, I avoid any site requiring a 3rd-party cookie in order to run correctly. An exception for logging when the mother site is different than the visited site, such as I think YouTube & Google (not sure though, no YouTube registration here and Google cookies blocked). An exception I accept in that it doesn’t revolt me, not in that I accept it personally.
We, you, many users here, myself manage to avoid the worst of the Web, we share our experiences, thoughts, remedies… but I just can’t imagine what tracking is for an immense proportion of users who ignore and/or underestimate what is done of their lives’ data. THAT revolts me.
Tom, I agree with you. What I wanted to specify that at least that site doesn’t busting my nuts If I reject all. Not really a compliment.
As a personal experience, third-party cookies (but also first-party cookies to a lesser extent) are not strictly mandatory in the almost all the sites that I go often to. Curiously this does not apply to some sites related to academic publishers (certainly I can’t boycott them) so I need to make a session exception in the case I’m at ETP custom mode. You can also take a look with Blacklight to science.org for another example and… no cookies, no browsing.
The real pain in the a… are third-party scripts nowadays if you use uBlock medium mode and you have to do a research on the web.
I never encounter sites requiring 3rd-party cookies but in case I did I do have a userChromeJS script/toolbar button which is set by default to block 3rd-party cookies and when clicked switches to block all cookies then to block none (then back to block 3rd-party cookies). So in case I’d encounter a site requiring 3rd-party cookies I can switch to allow all. Anyway all is cleanable when exiting the site with the Cookie Autodelete extension.
I never block all cookies (otherwise than for testing), only 3rd-party ones.
I tested science.org, it displayed correctly with 3rd-party cookies blocked.
As for uBlock Origin, I use the hard mode (3-rd party, 3rd-party scripts, 3rd-party frames blocked by default) . uBO’s mode need to say has a great impact on sites’ cookies requirements.
Another feature which impacts cookie requirements is the user’s User-Agent. I use the ‘User-Agent Switcher’ extension for Firefox and when a site appears too zealous with cookies I set my User-Agent for that site to Google Bot or Bing-Bot. That often helps, calms down many sites’ cookie hysteria, such as sites redirecting to Guce servers for advertisement consent,i.e.s Yahoo : I tell Yahoo that my User-Agent is Google-Bot and I enter directly into its pages without passing by Guce. Guce is blocked anyway which proves that no connection is even established with Guce servers. No Yahoo Mail! but I happen to visit its News pages…
IMO connections to 3rd-party servers should concern only Content delivery Networks (CDN). Besides that, essentially a pain, mostly ads, often trackers -> calamity. So blocking all 3rd-party connections by default in uBO is THE trick, with exceptions as we conceive them to be (here CDNs, healthy one I know, are exceptions).
Lastly, the not less famous LocalCDN extension, which, together with uBO (#1), Cookie Autodelete (#2) brings a remarkable basic and essential task force.
science.org was not about 3rd-party cookies (seeing now my few exceptions probably it was a site related to Elsevier), but in comparison to nbcnews.com you can’t disable all if you want and it also has its numbers on Blacklight. Ok, 31\61 (+ session recording) is quite nasty but maybe is better not investigate too much between sites because you realize that perhaps it is not the exception but the rule.
About security\privacy extensions now I have only uBlock, LocalCDN and Chameleon. The last one is not set for spoofing because I already use RFP, but I find it useful to manage referrer, quickly enable\disable RFP for the session if needed and it has CSS Exfil Protection integrated.
Am I surprised? No! Am I annoyed? Yep!
There are absolutely no limit to how low these jerks will stoop to monetize people and their data.
I love how none of this ever gets talked about in the mainstream but the constant BS that gets flogged daily gets talked about constantly but hey when you consider NEWS sites are amongst the top offenders then what do you expect.
Take me back to the days of Geocities websites, at least there wasn’t this level of BS around back then. I’m also fed up of all the idiotic layers of BS on websites that are no necessary at all not to mention stupid web elements that follow you around a website when you scroll.
Google’s internet by design. I am not surprised that the extension cannot be submitted on their feeble webstore with purposely gimped extensions controlled by the dark lord Google.
This is year long known and also an issue with password and login prompts. Which is why “smart” people do not autofill these. It takes a mere JS ajax request to send the data elsewhere.
Useful site to check those kind of details: https://whotracks.me/
I maybe be naive, but I really dont see how this could be legal.