What this tool does
Pasting HTML from a webpage, an email signature, or a rich-text editor often delivers a mess of nested tags, inline styles, and HTML entities mixed with the actual content. This tool strips the tags and leaves the readable text behind — with thoughtful handling of block elements, entities, and an optional allowlist for tags you want to preserve.
Everything runs in your browser. The HTML you paste isn't sent anywhere.
How it handles different elements
Block elements preserve line breaks
By default, the tool inserts a newline at every <p>, <div>, <h1>-<h6>, <br>, <li>, and other block-level tag boundary. This keeps the visual structure of the original content — paragraphs stay paragraphs, headings stay on their own lines, list items stay separated.
Uncheck "Preserve line breaks from block tags" if you want everything compressed into a single line. This is occasionally useful for extracting prose from heavily structured HTML where you'll re-flow the text manually.
Scripts and styles are removed entirely
Anything inside <script> and <style> tags is stripped completely, including the content between the opening and closing tags. This prevents JavaScript and CSS from leaking into your output. HTML comments are also removed.
HTML entities get decoded
By default, entities like &, <, ", are converted back to their character equivalents (&, <, ", non-breaking space). This is what you usually want — raw entities in the output are usually a mistake.
Uncheck "Decode HTML entities" if you specifically need to preserve the encoded form, for instance when the output will be re-inserted into HTML.
Allowlist for tags you want to keep
The "Keep tags" field accepts a comma-separated list of tag names you want to preserve in the output. Useful patterns:
a— keep links so the user can still click throughstrong, em— keep basic emphasis markupcode, pre— keep code formattinga, strong, em, code— the "Markdown-friendly" set
The opening and closing tags are preserved verbatim, including any attributes. So <a href="https://example.com"> stays as-is when a is in the allowlist.
Common use cases
- Cleaning copy from a CMS — paste HTML from a WordPress or Drupal export and get plain text suitable for re-flowing in a new template
- Extracting text from a webpage — paste a page's HTML and get the prose without navigation, ads, or footer content
- Email-to-plain-text conversion — paste an HTML email and get the text body without the styling
- RTF / Word paste cleanup — paste from a rich-text editor and strip the inline formatting that comes with it
- Stripping markup for word counting — get accurate word counts on content that's currently full of tags
- Preparing text for AI tools — many LLMs work better with clean text than HTML-laden input
What it doesn't do
This is a text stripper, not a Markdown converter. If you want to convert HTML to Markdown (preserving structure with # headings, * bullets, etc.), use a dedicated HTML-to-Markdown tool instead. This tool removes structure rather than translating it.
It also doesn't render HTML — there's no preview of what the original looked like. The right panel shows the stripped text, not a rendered version.
Privacy
All processing happens locally in your browser via the DOM parser. The HTML you paste isn't transmitted anywhere. Once you close the tab, the content is gone.
For batch processing of HTML files, our bulk converter accepts .docx uploads (which contain similar HTML-like markup) and applies the same kind of cleanup at scale.