What this tool does

Most text looks the way it looks. But some characters are invisible — they take up zero or one space, render as nothing or as something that looks like a normal space, and silently affect your text in ways you can't see. They're a common source of bugs.

This tool highlights every suspicious character in your input and shows what it is by name and Unicode code point. It also produces a cleaned version with those characters removed or normalized.

What's "suspicious"

Zero-width characters

Characters that take up no visual space at all. They were designed for specific scripts (Arabic, Devanagari) but get pasted into English text accidentally — usually from rich-text editors or AI-generated content. Common ones:

U+200B — Zero-Width Space
U+200C — Zero-Width Non-Joiner
U+200D — Zero-Width Joiner
U+2060 — Word Joiner

You can't see them. They break word counts, regex patterns, search indexes, and string comparison. If your text "looks fine" but two strings that should be equal aren't, suspect a zero-width character.

BOM (Byte Order Mark)

U+FEFF at the start of a file or string. It's a leftover from old UTF-16 encoding that some text editors still inject. In modern UTF-8 contexts, it's usually unwanted — it breaks JSON parsing, shows up as a stray character in some renderings, and confuses string comparison.

Non-breaking and weird spaces

Characters that look like regular spaces but aren't. They prevent line breaks at that position (in HTML), but they also fail string equality with regular spaces and break some regex patterns.

U+00A0 — Non-Breaking Space (NBSP)
U+2007 — Figure Space
U+202F — Narrow No-Break Space
U+2009 — Thin Space
U+2003 — Em Space
U+3000 — Ideographic Space (Japanese/Chinese)

Smart quotes and em-dashes

Microsoft Word, Apple Pages, and most modern word processors auto-convert straight quotes to curly quotes and double-hyphens to em-dashes. This looks better in display, but it breaks code, breaks programmatic text comparison, and produces output that doesn't match the input you intended.

U+2018, U+2019 — Left/right curly single quotes
U+201C, U+201D — Left/right curly double quotes
U+2013 — En-dash
U+2014 — Em-dash
U+2026 — Ellipsis (instead of three periods)

The cleaned output normalizes these back to their ASCII equivalents (straight quotes, hyphens, three periods).

Control characters

Non-printable characters in the ASCII control range (U+0000 through U+001F, excluding tab/newline) and the C1 control range (U+0080 through U+009F). These shouldn't appear in normal text. When they do, they're usually a sign of binary data leaking into a text stream.

How the highlighting works

Each suspicious character is rendered as a visible label inside a highlighted span. The label shows what the character is (e.g., "Zero-Width Space") and hovering over it shows the full Unicode code point. This is the only way to see characters that would otherwise be invisible.

The "cleaned" version (which you copy or download) has each suspicious character either removed (zero-width, control) or replaced with its sensible equivalent (NBSP → space, smart quote → straight quote, em-dash → double hyphen).

Common use cases

Debugging mysterious bugs — when two strings that should match don't, paste them in and see what's actually there
Cleaning content from AI tools — ChatGPT, Claude, and other LLMs occasionally insert zero-width characters or smart quotes
Preparing text for code or configuration — strip non-ASCII before pasting into a programming language file
Sanitizing copy from Word documents — smart quotes and em-dashes are auto-inserted by Word
Detecting hidden Unicode in usernames — some scams use zero-width characters to make impersonator accounts
Verifying input from forms — users sometimes paste content with hidden characters that break validation

Why this matters more in 2026

AI-generated text has dramatically increased the rate at which weird Unicode appears in everyday content. Many LLMs subtly use smart quotes, em-dashes, and occasional zero-width characters in their output. Content that looks human-written might fail string comparisons against a database storing the original because of an invisible character mismatch.

This tool is the diagnostic step. Paste suspicious text, see what's actually there, and either clean it or know to handle it differently.

Privacy

All character detection happens locally in your browser. The text you paste never leaves your machine.