How to detect Zero-Width Characters fingerprinting

Martin Brinkmann
Apr 5, 2018
Updated • Apr 5, 2018
Internet
|
26

All modern web browsers support zero-width characters. These characters may be added to text on a page without users knowing about it or being able to identify with the naked eye that text contains additional characters.

British security researcher Tom Ross described how zero-width characters can be used to add a logged in user's username to text that is copied by the user. The invisible information are included in paste jobs and all it takes then is to run checks to reveal the hidden characters.

While the method may not work at all to fingerprint a user's activity on the Internet, zero-width characters may be used to reveal the source of leaks or important leak information.

The following text excerpt includes ten zero-width characters: F​or exam​ple, I’ve ins​erted 10 ze​ro-width spa​ces in​to thi​s sentence, c​an you tel​​l?

These characters are invisible to the eye and they may not show up either when you paste the copied text. If you paste the text into an editor with spell-checking, you will notice that spell-checking flags words that look perfectly normal.

zero-width characters

But that can be easily avoided by adding the characters to the beginning or the end of words and not in the middle of them.

Ross published a proof of concept in which he converted the username of users to binary, a list of zero and one characters, to replicate the username using zero-width characters.

So, what can you do to detect if copied text includes zero-width characters?

You could paste the text into an editor that reveals these characters. Head over to DiffChecker and paste the text into the left text field on the site.

diff checker

You will notice immediately that the site displays zero-width characters in text that you paste on the site. The text is clean if the text appears normal.

Another option that you have is to use the Chrome extension Replace zero-width characters with emojis.

The extension replaces any zero-width characters it detects on sites visited in Google Chrome with emoji when you activate it.

zero-width characters fingerprinting

 

 

Just install the extension and click on its icon, and then on the "show me" button to reveal any hidden zero-width characters on the page.

You may want to activate the extension whenever you are about to copy text if you are in a situation where you don't want the pasted text be potentially be tracked back to you.

Closing Words

Zero-Width characters is just the latest thing that Internet users need to keep an eye on for when they are connected to the Web. (via Bleeping Computer)

Summary
How to detect Zero-Width Characters fingerprinting
Article Name
How to detect Zero-Width Characters fingerprinting
Description
Zero-width characters may be used to add characters to text that is invisible to the human eye but remains attached to the text when it is copied and pasted.
Author
Publisher
Ghacks Technology News
Logo
Advertisement

Tutorials & Tips


Previous Post: «
Next Post: «

Comments

  1. Blank Sturgis said on May 12, 2022 at 4:52 pm
    Reply

    Visiting this page in 2022.

    Latest version of Firefox shows those characters right here on the web page.

    “The following text excerpt includes ten zero-width characters: F​or exam​ple, I’ve ins​erted 10 ze​ro-width spa​ces in​to thi​s sentence, c​an you tel​​l?”

    Not sure if it’s stock Firefox or one of my add-ons doing it.

  2. Vrai said on April 8, 2018 at 2:29 pm
    Reply

    Using the ‘Document Statistics’ tool in ‘gedit’ will show the difference if you copy-paste one and manually type the other. Doesn’t show where the different characters are but at least one would know something is there.

  3. Ray said on April 5, 2018 at 9:59 pm
    Reply

    A user would have to use copy and paste a lo, so this seems like a very targeted use-case and not really applicable to the majority of users.

    1. A different Martin said on April 6, 2018 at 11:30 pm
      Reply

      Target Audience: leakers/whistleblowers; investigative journalists/transparency advocates; privacy obsessives; and … plagiarists! ;-)

  4. Steve S. said on April 5, 2018 at 9:47 pm
    Reply

    I tried an old-school method using the Windows 7 command prompt. Copy and paste the text into the command prompt and all the zero-width characters show up as question marks. Not of much use for editing, though!

  5. Jason said on April 5, 2018 at 8:02 pm
    Reply

    I just tried pasting that sentence in LibreOffice and got interesting results.

    Instead of showing the red spelling error squiggle, the software actually places a grey highlight between the visible letters, right where the invisible ones are. It also makes the invisible letters appear as tiny / marks that I can delete (but whose font size I cannot change).

  6. Liz McIntyre said on April 5, 2018 at 7:39 pm
    Reply

    Thanks Martin. Great article! Someone posted a link to it at reddit r/privacy, too: https://redd.it/89yyuw

  7. daveb said on April 5, 2018 at 6:41 pm
    Reply

    ‘Copy Plain Text 2’ is an extension for firefox/palemoon that works to strip these out as well.

    1. A different Martin said on April 5, 2018 at 8:23 pm
      Reply

      @daveb: I just tested “Copy Plain Text 2” in Pale Moon, and it did NOT strip out the zero-width characters. I’m disappointed because I was hoping for a functionality upgrade from the Copy Plain Text function in “Extended Copy Menu (fix version)”….

      If you copy and paste (or copy-plain-text and paste!) Martin’s sample text into LibreOffice Writer (6.x, at least), the zero-width characters are indicated with gray highlighting that overlaps onto the surrounding regular-width characters. (The highlighting is similar to that used for nonbreaking spaces and hyphens.) You can zap all of the zero-width characters by selecting and copying one of them and using it as the “find” character in a global find & replace that replaces the found character with nothing.

      1. A different Martin said on April 5, 2018 at 8:25 pm
        Reply

        Edit: … highlighting similar to … nonbreaking spaces and *nonbreaking* hyphens.

  8. RobertJ said on April 5, 2018 at 5:18 pm
    Reply

    @archie:

    “plain paste” ?

    … I assume that pasting a text-copy-containing-0widthcharacters into a “plain text” text editor (like Windows Notepad) automatically deletes all 0widthcharacters from the original copy of text (?) Is that correct?

    1. Martin Brinkmann said on April 5, 2018 at 5:42 pm
      Reply

      No, if you copy the text to Notepad and then check it using Diffchecker, you will notice that the characters are still included.

      1. non-zero width person said on April 5, 2018 at 10:10 pm
        Reply

        This surprised me! For years I’ve used Notepad type text editors to make plaintext as an intermediary step when copying HTML formatted text for use in some other document. But these zero width characters can slip through I now realize!

        I’d love to see a follow up post with more practical tips on how to deal with this issue. Which offline notepad/code editor tools can be used to quickly spot and, optionally, strip away these sneaky characters?

      2. non-zero width person said on April 6, 2018 at 10:33 am
        Reply

        After more testing I notice that the non-zero characters are visible as questionmarks ? in editors like Notepad++ *if* we set the encoding to ANSI. But that still doesn’t make it easy to strip them all out, since a search and replace on ? would also remove real questionmark characters.

        What is needed is a complete list of the unicode codes for these non-zero characters and a small script that finds and remove all such characters from an inputstring.

      3. A different Martin said on April 6, 2018 at 6:16 pm
        Reply

        Hmmm. When *I* copy and paste the text in Notepad++ (with Courier New as the font in the default Global style) and switch the encoding from UTF-8 BOM to ANSI, the zero-width characters display as a string of three characters:

        â — U+00E2 — Latin Small Letter A With Circumflex [lowercase a circumflex]
        € — U+20AC — Euro Sign
        ‹ — U+2039 — Single Left-Pointing Angle Quotation Mark [left single French quotation mark]

        except that the zero-width character in the word “I’ve” between the I and the apostrophe displays as:

        â — U+00E2 — Latin Small Letter A With Circumflex [lowercase a circumflex]
        € — U+20AC — Euro Sign
        â„¢ — U+2122 — Trade Mark Sign

        between the I and the v, *in place of* the apostrophe, which is missing.

        Doing a “search and replace with nothing” for these two strings zaps the zero-width characters but takes out the apostrophe in “I’ve”.

        Doing a “search and replace with apostrophe” (actually, either ʼ — U+02BC — Modifier Letter Apostrophe or ’ — U+2019 — Right Single Quotation Mark) for the second string zaps the zero-width character and leaves/restores the apostrophe in “I’ve”. I wonder just how many variations of “replacing combinations” of zero-width characters and printing characters there are…

        Testing this made me revisit LibreOffice Writer, where I realized that while Writer highlights the *first* zero-width-character “string” and allows you to zap it with “find and replace,” it doesn’t flag the zero-width character between the I and the apostrophe in “I’ve” *at all*. That second zero-width-character string is left intact when you copy it from Writer and paste it elsewhere.

  9. John in Mtl said on April 5, 2018 at 4:35 pm
    Reply

    @ Martin Brinkman: seems I’m no longer able to edit my posts, did anything change on the site?

    1. Martin Brinkmann said on April 5, 2018 at 4:57 pm
      Reply

      I turned the functionality off as it was buggy. I’m researching new options to reintroduce it.

      1. John in Mtl said on April 6, 2018 at 12:26 am
        Reply

        AH I see. For my part, I’ve never had a problem with the editing function on your site and I’ve been infrequently commenting for the last 2 years I think. I hope you find the bug!

  10. John in Mtl said on April 5, 2018 at 4:34 pm
    Reply

    This technique is quite sneaky!!

    I’ve pasted the text into TBird and the spaces do not show up, nor do emojis replace the spaces. However, if I invoke the spell checker, it does flag all the words with added hidden characters.

    I’ve also pasted the text into Notepad & Word2000 and the text appears normal!!

  11. Martin said on April 5, 2018 at 4:14 pm
    Reply

    I remember doing something very similar like this with dos 3.3, using alt 255.

    1. John in Mtl said on April 5, 2018 at 4:30 pm
      Reply

      If I remember my DOS correctly, adding an alt255 showed a space so it would not have been hidden… for example, I’ve added the alt255 here: spa ce.

  12. archie said on April 5, 2018 at 3:49 pm
    Reply

    Tried it in a gmail draft:
    – Plain paste shows the icons
    – Paste as text shows the icons
    – Paste + remove formatting clears hidden paces and triggers speel checker.

    Neat. Thanks Martin.

    1. archie said on April 5, 2018 at 3:50 pm
      Reply

      talking about spell check .. sry about that :)

  13. Anonymous said on April 5, 2018 at 2:37 pm
    Reply

    It’s a great way to combat plagiarism. By inserting a few on a website you are able to find out if somebody has just cut and pasted.

  14. C said on April 5, 2018 at 2:04 pm
    Reply

    I’ve often noticed this when pasting into editors set to show hidden characters, but hadn’t given a thought to the possible origin – I will now!

    I’m a long time reader but first time commenter; thank you for all you do for the community.

  15. Yuliya said on April 5, 2018 at 11:44 am
    Reply

    This is also a good technique to bypass word filters. I’ve seen websites where the word “screw” was blacklisted in the comments, so either refering to an actual screw or a screwdriver was impossible without having your comment thrown into moderation. Annoying.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.