Markup is the New Markdown

TL;DR: “HTML is the new Markdown” is an attention-grabbing headline (for some of us), but not something to adopt at face-value, or without more context. Where HTML applies, it genuinely delivers. Where it doesn’t, you’ll just be paying more per token for the privilege of being wrong.


Here’s something I catch myself doing constantly with AI content: skim the headline, fill in the details based on my own context, and run with a conclusion the original post may or may not have intended. I’m not the only one. It’s not laziness, it’s a cognitive defense mechanism in a world full of content, not to mention extra work hours keeping up with all the tools that are supposed to save us work hours.

This one is definitely that.

Earlier this month, Thariq Shihipar, an engineer at Anthropic, posted nine words on X: “HTML is the new markdown.” The post linked to a companion site with 20 self-contained .html files that an agent produced instead of the usual Markdown output. It pulled 8,600+ likes and 11,000 bookmarks. Simon Willison publicly reconsidered his three-year Markdown default. The Hacker News thread was climbing past 30 points an hour.

The reaction was big and fast, which is a sign that many people didn’t read the whole thing, or understand the whole context (ahem, “vibe coding).

So. Let’s look at what this shift actually means, where it applies, how to apply it, and what it does to your token bill.

My HTML Baggage (Relevant, I Promise)

I mastered HTML in the early 2000s. Semantic structure, tag vocabulary, clean markup from scratch. The whole deal. So when Markdown started getting traction among developers in the 2010s, my first reaction was skepticism: why learn a format whose entire job is to produce a subset of what I can already write directly?

The pragmatic case eventually won me over, and it wasn’t even close. By the early 2020s, documentation had moved decisively into Git. Design docs, specs, ADRs, READMEs, changelogs: all .md. GitHub renders it natively. It reads as plain text, commits cleanly, and it became the shared syntax of developer collaboration. Fighting it meant fighting a current that wasn’t going to reverse, so I stopped. The ubiquity was the feature.

I tell you that because when I saw Thariq’s post, I had already settled in mind that markdown is how to communicate with AI and this made no sense to me.  Back to that bad habit of skimming headlines. What I should have done first was ask: back for what?

What Thariq Was Actually Pointing At

The argument isn’t that Markdown is dead or that you should rewrite your documentation in HTML. It’s narrower and more specific than that.

Thariq’s 20 examples grouped HTML wins into categories of LLM output: project status reports, code reviews, diagnostic summaries, data comparisons. The things an agent produces that a human then has to read, navigate, and act on. When one researcher ran all 20 prompts through Claude in both formats, HTML won 17 of the 20 head-to-head comparisons. The 3 cases where Markdown held its own were tasks where the output stays internal to an agent’s loop and never reaches a human at all. (Source.)

Once a person is the end consumer, HTML’s richer vocabulary starts earning its overhead. Collapsible sections. Semantic structure. Tabbed layouts. Inline labels. Color-coded status. Things Markdown has no syntax for, because Markdown was never designed to produce navigable deliverables. It was designed to produce readable plain text.

LLMs have also been trained on billions of HTML pages, so the semantics of those tags are deeply embedded in how these models understand and produce structure. That doesn’t go away just because Markdown became the default output convention.

For human-readable LLM output, HTML deserves a serious look. That part of the headline holds up.

Where It Does Not Apply

This is where the skimming gets expensive.

For input to an LLM, Markdown is still the right default, and by a wide margin. Markdown uses dramatically fewer tokens than HTML for equivalent content. A Cloudflare analysis found that the Markdown version of a typical blog post used 80% fewer tokens than its HTML counterpart. In RAG pipelines, Markdown-formatted inputs have been shown to boost accuracy by up to 35% while cutting token costs by 20 to 30%. On structured tasks like table extraction, Markdown outperforms HTML at roughly 60.7% accuracy versus 53.6% in GPT-based evaluations. (Source.)

Worth noting: Profound ran a controlled experiment across 381 pages on 6 websites to test whether serving Markdown to AI crawlers versus HTML made a meaningful difference in bot traffic. The result was a marginal directional advantage for Markdown (~16% mean lift) that wasn’t statistically significant. (Source.) Which is to say, well-formed HTML isn’t incomprehensible to LLMs. But when you’re paying per token, the math still favors Markdown clearly.

For documentation in repositories, nothing about Thariq’s observation changes the picture. Markdown’s native rendering in GitHub and GitLab, its readability as plain text, and its role as the standard syntax of developer documentation are not touched by this argument. If your docs live in Git and humans need to read and edit them, Markdown is still the answer. Full stop.

The Token Bill Reality

This deserves its own section because it’s where the “HTML is back!” take gets most dangerous most quickly.

The token efficiency gap between Markdown and HTML is real and large. 80% fewer tokens for equivalent content isn’t a rounding error. At any meaningful scale, that’s a direct line to your API costs. HTML earns that overhead only when the output is rich enough, and human-facing enough, to justify it.

If your workflow involves long context windows, high-volume RAG retrieval, or large amounts of text being ingested or passed between agents, the format you choose for that content has a real cost consequence. Thariq’s post is not an argument for switching to HTML across the board. Applied without that nuance, it’s an expensive misread.

The Framework That Actually Helps

Four scenarios, four answers:

A human writes context and feeds it to a model: Markdown. A model produces output that stays inside an agent loop: Markdown. A model produces a deliverable a human will read, navigate, and act on: HTML is worth the token cost. Documentation lives in a repository: Markdown, full stop.

The headline “HTML is the new Markdown” is accurate for exactly one of those four. The other three haven’t changed.

Thariq’s post isn’t a verdict. It’s a recalibration for a specific use case. The fact that it spread the way it did says less about the content and more about how hungry people are for permission to do the thing they already half-wanted to do.

I’m not pointing fingers. I had the same instinct.

Additional Sources:

If you found this interesting, please share.

© Scott S. Nelson

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.