How to detect Zero-Width Characters fingerprinting
All modern web browsers support zero-width characters. These characters may be added to text on a page without users knowing about it or being able to identify with the naked eye that text contains additional characters.
British security researcher Tom Ross described how zero-width characters can be used to add a logged in user's username to text that is copied by the user. The invisible information are included in paste jobs and all it takes then is to run checks to reveal the hidden characters.
While the method may not work at all to fingerprint a user's activity on the Internet, zero-width characters may be used to reveal the source of leaks or important leak information.
The following text excerpt includes ten zero-width characters: F​or exam​ple, I’ve ins​erted 10 ze​ro-width spa​ces in​to thi​s sentence, c​an you tel​​l?
These characters are invisible to the eye and they may not show up either when you paste the copied text. If you paste the text into an editor with spell-checking, you will notice that spell-checking flags words that look perfectly normal.
But that can be easily avoided by adding the characters to the beginning or the end of words and not in the middle of them.
Ross published a proof of concept in which he converted the username of users to binary, a list of zero and one characters, to replicate the username using zero-width characters.
So, what can you do to detect if copied text includes zero-width characters?
You could paste the text into an editor that reveals these characters. Head over to DiffChecker and paste the text into the left text field on the site.
You will notice immediately that the site displays zero-width characters in text that you paste on the site. The text is clean if the text appears normal.
Another option that you have is to use the Chrome extension Replace zero-width characters with emojis.
The extension replaces any zero-width characters it detects on sites visited in Google Chrome with emoji when you activate it.
Just install the extension and click on its icon, and then on the "show me" button to reveal any hidden zero-width characters on the page.
You may want to activate the extension whenever you are about to copy text if you are in a situation where you don't want the pasted text be potentially be tracked back to you.
Closing Words
Zero-Width characters is just the latest thing that Internet users need to keep an eye on for when they are connected to the Web. (via Bleeping Computer)
Visiting this page in 2022.
Latest version of Firefox shows those characters right here on the web page.
“The following text excerpt includes ten zero-width characters: F​or exam​ple, I’ve ins​erted 10 ze​ro-width spa​ces in​to thi​s sentence, c​an you tel​​l?”
Not sure if it’s stock Firefox or one of my add-ons doing it.
Using the ‘Document Statistics’ tool in ‘gedit’ will show the difference if you copy-paste one and manually type the other. Doesn’t show where the different characters are but at least one would know something is there.
A user would have to use copy and paste a lo, so this seems like a very targeted use-case and not really applicable to the majority of users.
Target Audience: leakers/whistleblowers; investigative journalists/transparency advocates; privacy obsessives; and … plagiarists! ;-)
I tried an old-school method using the Windows 7 command prompt. Copy and paste the text into the command prompt and all the zero-width characters show up as question marks. Not of much use for editing, though!
I just tried pasting that sentence in LibreOffice and got interesting results.
Instead of showing the red spelling error squiggle, the software actually places a grey highlight between the visible letters, right where the invisible ones are. It also makes the invisible letters appear as tiny / marks that I can delete (but whose font size I cannot change).
Thanks Martin. Great article! Someone posted a link to it at reddit r/privacy, too: https://redd.it/89yyuw
‘Copy Plain Text 2’ is an extension for firefox/palemoon that works to strip these out as well.
@daveb: I just tested “Copy Plain Text 2” in Pale Moon, and it did NOT strip out the zero-width characters. I’m disappointed because I was hoping for a functionality upgrade from the Copy Plain Text function in “Extended Copy Menu (fix version)”….
If you copy and paste (or copy-plain-text and paste!) Martin’s sample text into LibreOffice Writer (6.x, at least), the zero-width characters are indicated with gray highlighting that overlaps onto the surrounding regular-width characters. (The highlighting is similar to that used for nonbreaking spaces and hyphens.) You can zap all of the zero-width characters by selecting and copying one of them and using it as the “find” character in a global find & replace that replaces the found character with nothing.
Edit: … highlighting similar to … nonbreaking spaces and *nonbreaking* hyphens.
@archie:
“plain paste” ?
… I assume that pasting a text-copy-containing-0widthcharacters into a “plain text” text editor (like Windows Notepad) automatically deletes all 0widthcharacters from the original copy of text (?) Is that correct?
No, if you copy the text to Notepad and then check it using Diffchecker, you will notice that the characters are still included.
This surprised me! For years I’ve used Notepad type text editors to make plaintext as an intermediary step when copying HTML formatted text for use in some other document. But these zero width characters can slip through I now realize!
I’d love to see a follow up post with more practical tips on how to deal with this issue. Which offline notepad/code editor tools can be used to quickly spot and, optionally, strip away these sneaky characters?
After more testing I notice that the non-zero characters are visible as questionmarks ? in editors like Notepad++ *if* we set the encoding to ANSI. But that still doesn’t make it easy to strip them all out, since a search and replace on ? would also remove real questionmark characters.
What is needed is a complete list of the unicode codes for these non-zero characters and a small script that finds and remove all such characters from an inputstring.
Hmmm. When *I* copy and paste the text in Notepad++ (with Courier New as the font in the default Global style) and switch the encoding from UTF-8 BOM to ANSI, the zero-width characters display as a string of three characters:
â — U+00E2 — Latin Small Letter A With Circumflex [lowercase a circumflex]
€ — U+20AC — Euro Sign
‹ — U+2039 — Single Left-Pointing Angle Quotation Mark [left single French quotation mark]
except that the zero-width character in the word “I’ve” between the I and the apostrophe displays as:
â — U+00E2 — Latin Small Letter A With Circumflex [lowercase a circumflex]
€ — U+20AC — Euro Sign
â„¢ — U+2122 — Trade Mark Sign
between the I and the v, *in place of* the apostrophe, which is missing.
Doing a “search and replace with nothing” for these two strings zaps the zero-width characters but takes out the apostrophe in “I’ve”.
Doing a “search and replace with apostrophe” (actually, either ʼ — U+02BC — Modifier Letter Apostrophe or ’ — U+2019 — Right Single Quotation Mark) for the second string zaps the zero-width character and leaves/restores the apostrophe in “I’ve”. I wonder just how many variations of “replacing combinations” of zero-width characters and printing characters there are…
Testing this made me revisit LibreOffice Writer, where I realized that while Writer highlights the *first* zero-width-character “string” and allows you to zap it with “find and replace,” it doesn’t flag the zero-width character between the I and the apostrophe in “I’ve” *at all*. That second zero-width-character string is left intact when you copy it from Writer and paste it elsewhere.
@ Martin Brinkman: seems I’m no longer able to edit my posts, did anything change on the site?
I turned the functionality off as it was buggy. I’m researching new options to reintroduce it.
AH I see. For my part, I’ve never had a problem with the editing function on your site and I’ve been infrequently commenting for the last 2 years I think. I hope you find the bug!
This technique is quite sneaky!!
I’ve pasted the text into TBird and the spaces do not show up, nor do emojis replace the spaces. However, if I invoke the spell checker, it does flag all the words with added hidden characters.
I’ve also pasted the text into Notepad & Word2000 and the text appears normal!!
I remember doing something very similar like this with dos 3.3, using alt 255.
If I remember my DOS correctly, adding an alt255 showed a space so it would not have been hidden… for example, I’ve added the alt255 here: spa ce.
Tried it in a gmail draft:
– Plain paste shows the icons
– Paste as text shows the icons
– Paste + remove formatting clears hidden paces and triggers speel checker.
Neat. Thanks Martin.
talking about spell check .. sry about that :)
It’s a great way to combat plagiarism. By inserting a few on a website you are able to find out if somebody has just cut and pasted.
I’ve often noticed this when pasting into editors set to show hidden characters, but hadn’t given a thought to the possible origin – I will now!
I’m a long time reader but first time commenter; thank you for all you do for the community.
This is also a good technique to bypass word filters. I’ve seen websites where the word “screw” was blacklisted in the comments, so either refering to an actual screw or a screwdriver was impossible without having your comment thrown into moderation. Annoying.