Meet Deadlines and Manage Technical Debt with AI-Assisted Architecture

tl;dr: New platform. Deadline. The instinct is to move fast and clean it up later. That’s where technical debt is born. A well-constructed Claude project, loaded with curated platform documentation and queried with the experience to know what to ask and how to evaluate the responses, tactically compresses the ramp-up without sacrificing strategic design principles.


The Sharp Fork in the Road

Every architect and engineering lead who has given a project to deliver on a new platform or with new technology under deadline pressure knows this fork. Pushing for proper preparation can get you marked (ironically) as a risk from the leadership perspective. Plowing forward using old techniques without understanding the new nuances keeps you up at night…either knowing you are missing something up front, or fixing what you didn’t know during the final death-march phase of a waterfall project that just happens to use Kanban boards, daily stand ups, and sprint ceremonies.

One path: move fast. Learn just enough to ship. Ask support when you hit a wall. Request exceptions when you hit limits. Get it working and tell yourself you’ll revisit the architecture when there’s more time. (There is never more time.) What you build in that mode becomes the foundation everything else is built on, and the cost of fixing it compounds with every sprint.

The other path: slow down. Read the documentation properly. Understand the platform’s constraints before you design around them. Make the right call the first time. This is correct and often impractical. Deadlines are real. The platform is new. The documentation is dense. The team is waiting.

The Contentstack project that prompted this post took the first path and ran into a SaaS governance constraint that happens to be measured recursively. The first time it was hit, the response was typical for teams working with a new SaaS vendor and release date that was set before the first line of code was written: Ask for an exception. Which was granted, hit again and raised again. Fortunately, the third time it happened, an experienced vendor support manager recommended reviewing best practices to avoid the issue. And an experienced architect was on the receiving end of that suggestion, one who had previously dealt with a Salesforce solution that went down three months after launch from relying on similar exceptions.

This post is not about Contentstack architecture. It is about the challenge many teams face with balancing target dates and defensive design decisions, and a tool set to apply in order to keep from tipping too far in either direction.


Claude as a Platform Research Partner

Giving Claude access to a curated set of platform documentation and then working interactively to explore solutions is not a replacement for architectural experience. It is an accelerant for it. It is also not a way to do away with architects or the inclusion of design tasks at the feature or story level. It is how to fulfill the expectation that AI can provide ROI immediately when applied by experienced technologists.

These distinctions matter. It’s never about “ask Claude what to do” (because if you need to ask “what” you won’t know how to ask “how” when the time comes). It is “I understand how systems like this behave, I know which constraints are likely to compound, and I need to move through the analysis faster than I could alone.” Experienced architects and engineers bring the judgment: familiarity with how content models fail at scale, how schema resolvers typically handle recursion, how vendor-imposed limits usually reflect real constraints in the underlying system. Claude brings the recall, the scripting, the cross-referencing, and the tireless patience for the kind of recursive schema analysis that would take a senior engineer the better part of a day.

For those that follow my posts you know that I will often describe theoretical solutions backed by a combination of personal experience where they would have worked linked to examples from others who demonstrated that they work. In this case the experience came before the theory, working backwards from a result where I noticed the process while documenting the solution (because, hey, that is what architects do after they solve something).

The working example was with a Contentstack implementation. It took one focused 2-hour session to identify an obscure root cause, define a strategic solution, discover other areas to apply the same solution, and identify where the solution would cause more harm than good. A second 30-minute session was applied after the first round of refactoring to validate the impact and prioritize the remaining effort. Before Generative AI, this would have been several days of effort that would not have been attempted until the risk was realized in production delay.


The Project is the Architecture

Before a single question gets asked, the project has to be built. This is not setup overhead. This is the work.

A blank Claude chat window and a well-constructed project will give you very different results on the same question. The difference is not the AI. It is the knowledge boundary, the taxonomy, the instructions, and the accumulated session output. Strip those away and you have a general-purpose assistant guessing at context. Keep them and you have something that behaves like a senior researcher who has been on the project for months.

What goes in the project folder:

Downloaded documentation as markdown files, not links. Links go stale, require fetches, and introduce latency. Pull the platform docs that matter, save them as markdown, put them in the folder. For Contentstack: the Global Fields limitations page, the Content Modeling Best Practices guide, the Custom Fields documentation. Not every page in the docs. The ones relevant to the work. Knowing which ones matter is the first place architectural experience shows up.

Actual data from the platform. In this case, exported stack JSON. Claude can read it directly in the sandbox, run scripts against it, and cross-reference findings against the loaded documentation in the same session. That combination of curated docs and live data is what makes the diagnosis precise instead of speculative.

Session summaries. After each working session, have Claude produce a structured summary: the original problem, the conclusions, the evidence, the next steps. That file becomes the cold-start document for the next session. You don’t re-explain the context. You hand Claude the prior session’s output and continue. The knowledge compounds.

At some point (again, much of this requires human intuition gained through real-world experience), have Claude work with you to turn the summaries into a skill scoped to the specific platform, technology, or tool so that when they are in context these lessons learned will be applied automatically going forward.


The Taxonomy Is Not an Afterthought

Separate downloaded reference content from working session output. Nest folders by topic. /reference/, /sessions/, /data/ serve different purposes and should live in different places. This is not pedantry. It is how you make the project instructions work correctly, and how you find things six weeks later without rebuilding context from scratch.

If the platform has extensive documentation, don’t try to enumerate allowed URLs in the project instructions directly. Create a reference-urls.md, or per-topic files like contentstack-docs-urls.md, with an annotated, categorized list of approved sources. Claude works from the list. You maintain the list. It stays current and searchable.

The discipline compounds the same way the session summaries do. A well-organized project from session three makes session fifteen faster than session one.


The Project Instructions Are the Rules of Engagement

The instructions define how Claude behaves inside this knowledge space. Three things they need to do:

Challenge assumptions. If a question implies something not supported by the loaded documentation, say so. Don’t fill gaps with plausible-sounding answers. The most dangerous thing a research assistant can do is answer confidently on insufficient evidence. This instruction eliminates a whole category of hallucination risk before it starts.

Point out mistakes. If the framing of a problem is wrong, say so. This is the instruction most people skip and then complain about later. You want an assistant that pushes back, not one that validates your bad hypothesis and helps you build a case on sand.

Limit web searches to specific URLs. Unconstrained web search in a technical investigation introduces noise: outdated content, inconsistent sourcing, SEO-optimized answers that aren’t accurate. Lock it down. Specify which domains are permitted. For a Contentstack project, that’s contentstack.com/docs. Everything else requires explicit permission. If the approved URL list is long, store it in a markdown file in the project folder and point the instructions at it.


This Requires an Architect

Here is the part that does not get said enough.

You cannot point Claude at an unfamiliar platform, load a few docs, and expect it to diagnose architecture problems. You can try. What you’ll get is fluent, confident, and partially wrong.

There are many engineers capable of setting this up. The value of an architect doing the work is separation of concerns in roles. The architect’s role is to nail down processes and choices that allow engineers to focus on the best way to apply them.

In our Contentstack use case, the single session worked because the person directing it brought a deep understanding of adjacent technologies and the experience to know both what to ask and how to evaluate the responses. Specifically:

  • Recognizing that the error message pointed to a schema limit, not a code problem, because that’s how content platform resolvers typically surface constraint violations
  • Understanding that “recursive” in the documentation meant multiplicative compounding, not additive, based on how similar systems handle nested references
  • Knowing the fix had to leave the content model intact for authors, which ruled out several otherwise obvious approaches
  • Reading a Claude-generated Python script’s output and recognizing that the confident result provided the first time was due to looking in the wrong parts of the schema
  • Looking at a before/after instance table and determining whether the fix was actually complete or just moved the problem

None of that knowledge lives in the documentation itself. It transfers in from adjacent experience: content modeling, schema design, how platform resolvers work under the hood. Claude surfaces the platform-specific detail. The architect determines what it means.

The tool doesn’t replace experience. It supercharges it with speed and specific knowledge.


The Interaction Pattern

What the Contentstack session actually looked like, stripped of the platform specifics:

  1. State the problem. Provide the evidence: the error message, the exported schema, the documentation.
  2. Claude generates a hypothesis. Test it against the data.
  3. Diagnostic script written and run in the sandbox.
  4. Root cause confirmed. Fix designed. Impact predicted before any schema changes are made.
  5. Fix implemented. Follow-up session loads the new export and verifies the result.
  6. Summary file created. Next session’s candidates identified.

No magic. An architect with relevant adjacent experience, a fast and patient research partner, and a well-stocked project folder.


Prompts That Did Actual Work

These are worth examining because the techniques transfer to any platform.

“Describe in detail the cause of home_page_template having 24 instances, and instances of what?”

The second half of that question is the important part. Asking Claude to define what it is counting before giving the count forces precision on both sides. In technical sessions on an unfamiliar platform, jargon can mask shallow understanding without anyone noticing until the fix doesn’t work. The ability to ask that follow-up, to know that “instances” needed a definition before the number meant anything, comes from having debugged similar problems elsewhere. Use this pattern whenever an answer could be technically correct but operationally ambiguous.

“Create a summary file to feed to the next analysis session that includes the conclusions from this session combined with the original inputs. Format and sequence the file so that the next session can be as efficient as possible.”

Besides being familiar with adjacent technology, experience solving complex issues with Generative AI is why this is an approach for architects and engineers. Yes, Claude will now start compacting sessions on its own to improve efficiency, but having the sense that it is time to move to a new session is again an area where human experience beats relying entirely on the AI.

This prompt converts a working session into a durable asset. The phrase “format and sequence for efficiency” is carrying real weight: it tells Claude to think about how the file will be consumed, not just what it contains. The output becomes the cold-start document for the next session. Without it, every session re-derives context the previous one already established.

“Read the attached to get full context of the original issue, then review the contents of [folder] and determine if and how the issue has been improved.”

Sequencing does the work here. Claude gets the full prior-session summary before it touches the new data, so “improved” arrives with a precise definition attached. Without that order, it analyzes the new export without knowing what it’s comparing against. Prime with context before assigning the task, every time.

All three follow the same pattern. Context before task. Output format stated up front. It is not a methodology. It is just how you would brief a colleague who needs to be useful on short notice.


The Setup Is the Differentiator

Two teams, same platform, same error.

Team A has Claude. No curated project, no loaded docs, no taxonomy, no instructions. They get generic answers that feel helpful until they don’t hold up under the actual constraints of the platform.

Team B has a project built by someone with deep experience in adjacent technologies, content modeling, schema design, API behavior under constraint, who knows both what to ask and how to evaluate what comes back. Downloaded reference docs. Exported platform data. Session summaries that carry forward. Instructions that push back on bad assumptions.

Team B gets a root cause analysis, a fix, and a forward-looking roadmap. More importantly, they get it without accumulating the kind of structural debt that shows up six months later as an emergency.

A Note about Choosing Cowork

What I’m describing is not the typical use case for Claude’s project-based workspace. It is aimed at knowledge workers automating routine tasks: organizing files, generating reports, drafting communications. Productivity stuff. This is not that.

My choice of Cowork is based on my day-to-day work being mostly in documents and decks. This could also likely be done using Claude Code in an IDE for those that prefer that interface.

I became aware of how far outside the lines I was operating when someone asked what tool I was using, I explained it, and I watched the look on their face. You know the look.

I have been here before. I spent years using JMeter for continuous functional and regression API testing, which is not what JMeter is for. JMeter is a load and performance testing tool, and there are entire communities of people who will tell you this. They are correct and also missing the point, because once you understand how JMeter handles realistic randomized inputs and configuration-driven test selection, you end up with one codebase doing the work of four. I wrote about it. People told me I was doing it wrong. The tests kept passing, so.

It is common to analogize the similarities between physical tools and technical tools. “When all you have is a hammer, everything looks like a nail”, and “You can use a screwdriver as a chisel, but you really shouldn’t.” I’ve often used those myself. But the opposite analogies are also true. Most tools can be a weapon, and many tools can have multiple uses. While screwdrivers are still terrible chisels, some are great prybars, hole punches, and, yes, weapons. Same with software. Excel has spellcheck, but I’d never paste text into it before posting to a blog, but I have used formulas to parse text rather than writing a script to apply regex rules because it is faster and just as accurate. Use your tools to the extent of their value, and don’t underestimate their value or your ability to innovate.

If you found this interesting, please share.

© Scott S. Nelson

The Gold Rush Was Never Just About Gold

TL;DR: Most people who chased the Gold Rush didn’t know what they were getting into. They saw headlines about fortunes and stories about how easy it was. Many went because their livelihoods were already threatened. Sound familiar?


Let’s be honest about who the average Gold Rush prospector actually was.

Not a rugged adventurer with a prospecting education and a solid savings account. Not someone who had studied geology or mapped the terrain. The typical forty-niner was a farmer whose crops had failed, a tradesman who had lost his shop, or a clerk who had read a breathless newspaper account and decided a long-shot bet beat a certain slow decline.

The California Gold Rush of 1848 and the Klondike rush of 1896 were separated by nearly fifty years and thousands of miles, but they drew from the same well: economic desperation dressed up as opportunity.

The context matters here, because without it the behavior doesn’t make sense.

The years leading up to the California rush included a global recession following the Panic of 1837, crop failures across the Midwest, and a population of young men with limited options. When James Marshall found gold at Sutter’s Mill in January 1848, the news didn’t just spread quickly, it spread selectively. The people who acted first were the ones who needed it most. Same story in 1896, when word of the Klondike strike reached Seattle and San Francisco during a prolonged economic depression that had pushed national unemployment past 20 percent. The ships heading north were not full of people with a plan. They were full of people with a problem.

Not everyone was running from something. Some were adventurers who wanted something different, or already had a good life and wanted something better. And not everyone coming out of a bad situation went in blindly. What almost everyone had in common were expectations that diverged sharply from how things turned out.

The relevant point is not that these people were reckless. It’s that economic pressure meant the average participant arrived undercapitalized, underprepared, and motivated primarily by someone else’s story of overnight success. They were chasing a headline, not a thesis. The results reflected that, in aggregate, almost immediately.

That pattern matters because it is not a 19th-century phenomenon. It is what every hype cycle looks like from the inside.

Each rush also moved in distinct waves. The rules that determined who succeeded in the first wave had almost nothing in common with what it took to win in the second. Most people who got swept up never stopped to ask which wave they were actually in. That question turned out to matter more than almost anything else.


First Wave: Right Place, Right Time, Right Creek

The first wave of California gold hunters had a genuine advantage. Here is what that advantage actually was. Not superior skill. Not better research. Proximity to the news.

Many of the earliest California prospectors were already in the territory: soldiers, settlers, and tradespeople who heard about Marshall’s discovery within weeks and moved fast. The surface deposits in the Sierra Nevada foothills were accessible, concentrated, and required almost no expertise to extract. A pan, a creek, and a willingness to stand in cold water for twelve hours were the main requirements. In that environment, showing up early mattered more than showing up prepared.

The Klondike told a similar first-chapter story. The initial claims along Bonanza and Eldorado Creeks were staked by prospectors already in the Yukon when George Carmack’s group made their discovery in August 1896. They were not the product of a coordinated strategy. They were in the right place when the right thing happened.

First-mover advantage is real. The people who moved fast in that window got a return no amount of later preparation could have replicated. But the window was short, the geography was finite, and it closed before most people had even heard the news.


Second Wave: The Pan Is Not Going to Save You

By 1852, the dynamics of the California Gold Rush had fundamentally changed. The surface deposits were gone. The creek beds that had yielded fortunes with a simple sluice box were picked clean by the first wave. The second wave arrived to find a very different landscape than the one the newspaper stories had described.

The prospectors who succeeded in the Second Wave did so through entirely different means. Hydraulic mining operations used high-pressure water jets to blast entire hillsides and process material through sluices, yielding gold at scale but requiring capital investment and systematic planning. Geologically-informed prospectors who understood quartz reef formations studied where gold veins actually formed and discovered productive sites where random panning had repeatedly failed. Syndicates pooled resources to fund deep shaft mines that reached deposits unreachable by individual surface workers.

Preparation was no longer an advantage. It was the entry requirement.

The Klondike replicated this pattern almost exactly. By the time the mass wave arrived in 1898 after a brutal trek over the Chilkoot Pass, which the Canadian government required each prospector to complete while carrying a year’s worth of supplies, the accessible claims were long staked. The prospectors who completed that crossing and still found nothing with a pan were not unlucky. They were late, and they were underprepared for the wave they had actually entered.

This is also where technology shows up on both sides of the ledger. The Industrial Revolution had already been displacing Eastern tradespeople and artisans for a generation, which goes a long way toward explaining why those gold rushes had the human fuel they did. Factory looms had replaced hand weavers. Steam-powered equipment had displaced skilled craftsmen. The Gold Rush was, in no small part, a downstream consequence of technological disruption seeking an economic escape valve. And then, within the rushes themselves, industrial technology, hydraulic systems, and organized mining operations began displacing the individual prospector. The image of the lone miner with a pan was already obsolete while people were still forming it.


Gold Wasn’t the Only Thing in Them Thar Hills

Some prospectors did strike it rich. The early arrivals at Coloma, the men who staked Bonanza and Eldorado before the word spread, the syndicates that scaled hydraulic operations with enough capital to actually move mountains. These were real winners. Gold was there. People found it. Fortunes were made.

But a parallel economy was running alongside the prospectors, quieter in the moment and, in the long run, more durable.

Sam Brannan did not own a gold claim. He owned a hardware store, and before he told anyone about the gold discovery, he bought up every pick, pan, and shovel in Northern California he could find. Then he walked through San Francisco holding a vial of gold dust, shouting about gold from the American River. He became California’s first millionaire. He did not find a single ounce himself.

Levi Strauss did not mine. He figured out that miners destroyed pants at an extraordinary rate and needed something that could survive the work. He made pants. Generational brand.

Wells Fargo did not mine. They moved money and packages for people who did. They are still here.

The common thread is not that these people were smarter than the prospectors. It is that they studied what the prospectors would certainly need rather than betting on where the gold might be. The uncertain bet was “this particular creek has gold.” The certain bet was “whoever finds the gold will need pants, tools, and a way to move money.” One of those bets required luck. The other required observation.

This path was available in the First Wave and Second Wave equally. It did not depend on timing. It scaled with the rush rather than competing within it. And it generated more durable wealth than almost anyone who was actually in the river.


The Roaring 20’s

Not the flapper and speakeasy era. This is the era of data centers and solopreneurs; dueling model metrics and learning evaluations; digital assistants evolving into personal agents and agentic automation that builds new automation agents. Billion-dollar funding rounds for companies that did not exist three years ago. Job titles that nobody had in 2021, now listed as critical hires. Entire industries trying to figure out if they are the disrupted or the disruptors, and running low on time to decide.

Models released on a Monday that are obsolete by Friday. Consultants who barely knew what a prompt was in 2022, now billing as AI transformation architects. Boardrooms demanding AI strategies before anyone has agreed on what problem they are solving. Vendors with “AI-powered” on the label whether the product has meaningfully changed or not.

The energy is real. The stakes are real. And unlike some previous cycles, so is the underlying technology.

The dot-com boom was real too. It produced Amazon, Google, and the infrastructure of the modern internet alongside thousands of spectacular failures. The AI shift is already demonstrating measurable productivity gains across industries, and the underlying technology is improving faster than most predictions have accounted for. Dismissing it as pure hype is the wrong read, and the people making that call loudest will look exactly like the analysts who declared the internet a fad in 1997.

The problem is not that people are excited about a real thing. The problem is that when real opportunity appears, it activates the same psychological patterns that sent underprepared people over a mountain pass in 1898. The gold rush mentality does not require the gold to be absent. It just requires the promise of gold to be louder than the instructions.

The opportunity is real. The question is whether you are building toward it, or just rushing toward it.


The AI First Wave Already Happened

From roughly 2022 through 2023, companies that moved aggressively into AI-native product development, workflow automation, or customer-facing AI features got real first-mover advantage: lower competition, compounding productivity gains, and a learning curve head start that is genuinely hard to close. Some of this was vision. Some was access. Some was timing. The window was real, and the returns were real.

Most businesses did not catch it. Large organizations move slowly by design, and procurement cycles are not calibrated for technology windows that last 18 months. That is not a criticism. It is a description of how large organizations actually work. (I have been in those rooms. Guilty.)

What it means is that most businesses are now in the Second Wave, whether they have acknowledged that or not.


Second Wave Requires a Different Playbook

The companies treating AI adoption as a First Wave problem in 2025 and 2026 are showing up in California in 1852 with a pan. The accessible value has been captured. What remains requires the methodical approach.

Imagine you could see exactly where your organization loses an hour a day to rework, manual handoffs, and decisions made on bad data. That is what a process audit produces. It is not glamorous. It does not show up in the conference keynote. But it is the difference between knowing where the gold is and hoping the next creek looks promising.

Start there, not with tool selection. Map where time, money, and errors concentrate in your current operations. Identify which problems AI can address with reasonable reliability, and which ones it will make worse by hallucinating confidently inside a business-critical workflow. Run contained pilots with defined success criteria before scaling anything. Build internal AI literacy and governance at the same time you build capability, not after something goes wrong publicly.

Then, only after you understand what AI can reliably do in your specific context, start redesigning processes to take advantage of it rather than bolting it onto what already exists. The order matters. Inverting it is how you end up running hydraulic equipment you do not know how to operate into a hillside you have not assessed.

[True story placeholder: add an example of a project or initiative where the stated plan and the available path did not match, and what it cost to discover that. A rollout, a migration, or a vendor implementation where the “easy button” turned out not to exist.]

Preparation is not glamorous. But it is the entry requirement now. That distinction matters.


The Niche Play Nobody Is Talking About

Here is the thing about Sam Brannan, Levi Strauss, and Wells Fargo: none of them would have been described as gold rush companies.

Brannan was a merchant. Strauss was a dry goods trader. Wells Fargo was an express and banking operation. The Gold Rush was the economic context that made their businesses thrive and scale, but their identity was not “gold rush business.” Their success was driven by the rush. They were not of it.

While the gold rush era was a boon to the merchant class, imagine if technology had been more advanced then. Gold is one of the most effective electrical conductors on earth. It does not corrode. It does not tarnish. It carries signal reliably in conditions that defeat most other materials. Today it is in every smartphone, every circuit board, every aerospace connector, and every implantable medical device. The miners panning those California creek beds were sitting on the raw material for the digital age and had no way to know it. They were chasing the obvious use. The compounding value was in applications that had not been invented yet.

AI is playing the same role for business processes right now, visible to anyone paying attention. It is the super conductor of this moment, not for electrons but for decisions, workflows, and the intelligence buried inside operations that were built for a different era. And just as the real gold economy grew around refining, transporting, and applying the metal rather than simply extracting it, the real AI economy is growing around discovering, implementing, and refining how AI connects to the work that organizations actually do.

Every organization trying to adopt AI will need clean, well-governed data. They will need people who can actually work alongside these tools rather than just technically access them. They will need integration between new AI capabilities and legacy systems that were built for a different era. They will need expertise in figuring out which processes actually benefit from AI involvement and which ones just look like they should.

None of that requires building a foundation model. None of it requires a large AI research budget. All of it requires observation, the same skill that made Sam Brannan wealthy while everyone else was panning creeks.

The businesses that build toward serving those needs may never be described as AI companies. They will be managed service providers, training firms, systems integrators, compliance consultants, data governance specialists. The AI boom will be the context that defines their era, even if it is not the label on their door.

That is not the consolation prize. That is the long game, and it has the most reliable odds.


Your Actual To-Do List

Three questions worth answering honestly before the next AI initiative.

Which wave are you actually in? If you are evaluating AI tools for general business adoption in 2025 or 2026, you are in the Second Wave. The First Wave is not waiting. Adjust your expectations and your approach accordingly.

Are you prospecting or supplying? If you are using AI to improve your own operations, you are prospecting. If you are building toward serving the certain needs AI adoption creates in your industry, you are supplying. Both are valid strategies with very different playbooks.

Are you auditing before you automate? The methodical prospectors of the Second Wave studied the geology before they dug. The equivalent is understanding your current processes, your data quality, your organizational readiness, and your actual use cases before purchasing a platform and announcing an AI strategy.

The Gold Rush did not reward the desperate or the hasty at scale. It rewarded the timely, the prepared, and the observant, in that order, depending on which wave you caught.

The AI boom is running the same playbook. The question is not whether the opportunity is real. It is whether you are building toward it the right way.


WTW Influence Note

Principles applied from Words_That_Work_Reference.md: – Brevity (Rule 2): TL;DR tightened from v2. The long opening paragraph of “The Setup” was split into three shorter paragraphs for better pacing. – Visualization (Rule 8): “Imagine you could see exactly where your organization loses an hour a day to rework, manual handoffs, and decisions made on bad data” in the Second Wave playbook section. Also, the gold-as-conductor passage in The Niche Play gives a concrete physical image before the abstract AI transition. – Aspiration (Rule 7): Forward-looking close added to “The Roaring 20’s.” “That is not the consolation prize” reframes the long game in The Niche Play. Final closing line oriented toward opportunity rather than risk. – Novelty (Rule 5): The gold-as-conductor bridge in The Niche Play section gives readers a genuine “I never thought of it that way” moment, connecting a familiar historical asset to its modern technological applications before pivoting to AI. – Context before claim (Rule 10): The Niche Play now builds through the gold-conductor frame before making the AI claim, rather than asserting the parallel directly.

Not applied / deferred to Scott’s voice: – Personalize and Humanize: WTW recommends specific named individuals in the reader’s demographic. Scott’s voice uses historical examples instead, and the Brannan/Strauss/Wells Fargo structure does that work more effectively for his audience. – Positive beats negative (full inversion): WTW would push harder toward leading with the upside throughout. Scott’s anti-hype voice depends on naming the problem first. Aspiration applied only at section closes.

If you found this interesting, please share.

© Scott S. Nelson

Clearing AI Adoption Bottlenecks: Lessons from Highway Planners

TL;DR: Traffic researchers discovered that adding more road often makes congestion worse, not better. Most AI rollouts are doing exactly that. The fix is the similar to what highway departments figured out decades ago: change behavior first, then worry about capacity.


I have spent more hours of my life commuting than I care to remember, and I have mixed feelings about how it (this is not about WFH vs RTO, which I also have some ambivalence about). OTOH, I can always think of things that would feel more productive. OTOH, the mental autopilot leaves room for solutions that eluded me during the working day. It is also a good time to contemplate paradigm shifts as they play out: from paper to digital, from MVC to SOA, from on-premise to cloud, and now everything(?) to AI.

The transitions that stick share a pattern. Not a hype arc. A pressure arc. The system resists, adapts, then acts like it was always this way. Different technology, same dynamics.

A lot like how people behave on the highway.

Deliberate Slowing to Speed Things Up

Transportation researchers spent years collecting data on traffic flows, tracking volumes before and after road expansions, mapping where congestion formed and how fast it returned. When Gilles Duranton and Matthew Turner analyzed the numbers across US cities, what they found ran counter to the prevailing assumption. A one percent increase in highway capacity produced almost exactly a one percent increase in driving (among other things). Add a lane, and within a few years congestion is back where it started, sometimes worse. They named it The Fundamental Law of Road Congestion. The instinct to build more road was not just ineffective. It was making the problem worse.

Separate research produced an equally counterintuitive result. In 2008, Yuki Sugiyama and colleagues put 22 cars on a circular track and told everyone to hold a steady speed. No merges, no accidents, no bottleneck. Yet above a certain density, a jam appeared out of nowhere and rippled backward through the pack. One driver braked slightly, the car behind overcorrected, and the wave propagated. A traffic jam with no external cause. The fix was not more road. It was more deliberate driving: leave a gap, anticipate, resist the urge to overcorrect.

These findings changed practice. Highway departments that once defaulted to expansion started investing in variable speed limits, ramp metering, and traffic calming measures. Smaller interventions, aimed at behavior rather than capacity, moved more cars through at lower cost. The road mattered less than how people used it.

Same Jam, Different Road

The parallel is direct, and I have watched it play out from both sides of the table. MIT’s Project NANDA found that after roughly $30 to $40 billion in enterprise AI spending, about 95 percent of organizations saw no measurable impact on the bottom line, with only around 5 percent of pilots producing real revenue. That is not a rounding error. It is the Fundamental Law of Road Congestion applied to a software budget: thirty to forty billion dollars worth of new lanes, and most of the cars are barely moving (or heading in the wrong direction).

The organizations stalling out follow a common pattern: tools deployed before workflows get redesigned; licenses purchased before anyone has defined what problem they are solving; metrics devised to measure what was done over what is possible. (That last reminding me of my favorite quote of contested origin.) When results disappoint, the leadership of organizations struggling with AI adoption initiatives either pull back everything or double down with a broader mandate and no clearer strategy. Both reactions make the jam worse. Adding capacity without fixing the underlying process is the detour. The congestion moves, but it does not clear.

The phantom jam dynamic shows up here too, and it spreads faster than any highway bottleneck. One over-tasked leader reads a discouraging headline, taps the brakes, and suddenly the whole initiative is under review. Or a competitor ships something flashy and someone stomps on the gas with a company-wide mandate before anyone is ready or knows where to go. The density of anxiety crosses a threshold, and the shockwave does the rest. Nothing structural changed. Behavior caused the jam, and only behavior can smooth it.

Where the Congestion Actually Forms

The real bottlenecks in AI adoption are rarely where struggling enterprise leadership looks for them. The tool is not usually the problem. The problem is everything around the tool: unclear ownership, undefined success criteria, and a workforce that was handed a license with no guidance on what problem it was supposed to solve. I have seen teams buy Copilot seats for every developer in the org and then measure success by activation rate. They got activation. They did not get output. Those are not the same thing, and conflating them is how you burn a year and come back to the next planning cycle with nothing to show for it.

There is also a shadow traffic problem that nobody talks about enough. When the official AI rollout is too slow, too restricted, or too vague, people route around it. They use personal ChatGPT accounts. They paste sensitive data into consumer tools. They build their own prompts in the gaps the IT department did not anticipate. This is not rebellion. It is adaptation. It is what happens when capable people hit a congestion point and look for the on-ramp the original road designers missed. The workaround is a signal, not a discipline problem. Ignoring it does not make it stop. It just makes it invisible.

Governance is the infrastructure that never gets funded until something goes wrong. Who is accountable when the model is confidently wrong? What happens to the output when the underlying model changes? Which data is allowed in, and which is not? These are not legal abstractions. They are the guardrails that let the rest of the system move faster. Organizations that build them early spend less time recovering from incidents and more time compounding on the investment. Skipping governance to move faster is the merge lane strategy. It feels efficient right up until everyone is stopped.

The Mandate That Made It Worse

Everett Rogers mapped how innovations spread through a population decades before generative AI existed: early adopters first, then the majority, then the laggards. The laggards are rarely the problem. They are often waiting for the road to be built around the tool, clear governance, reliable data, a documented sense of who is accountable when something goes sideways. Mandating faster adoption without building that infrastructure does not accelerate the curve. It creates congestion earlier in the journey, and the shockwave from that early jam takes longer to resolve than the time you thought you were saving.

Organizations that lead with training and strategic framing before deployment consistently outperform those that lead with usage mandates. When people understand what a tool is for, what it does well, and where it falls short, they use it in ways that compound over time. When they are handed a tool and told to use it more, they find ways to hit the metric without changing how they actually work. Activity goes up. Value does not follow.

Incentives tied to outcomes, paired with genuine investment in skills and strategy, produce something different: people who understand where they are going and why the tool helps them get there.

Getting Somewhere

Better roads move more people than bigger roads. The organizations getting real returns from AI are not the ones with the most licenses. They are the ones with the clearest processes, the best-trained people, and a strategy that connects the tool to an actual destination.

Define what success looks like before you deploy. Name the problem before you buy the solution. That clarity reduces friction for everyone involved, and it makes the detours worth something when they happen, because the detours will happen. The unexpected use case, the team that figured out something no roadmap would have suggested, the finding that reframes the whole initiative: those are not failures of planning. They are what happens when capable people have good tools and room to move. The goal is not to prevent detours. It is to be in good enough shape to recognize a promising one when it appears, rather than sitting too stuck to turn.

The lane was never the problem.

(Next up: the joys of reading on public transportation.)

If you found this interesting, please share.

© Scott S. Nelson

50 First Prompts

TL;DR: LLMs do not remember anything between calls. Every “conversation” you’ve ever had with one was reconstructed from scratch by replaying history into the context window. If your architecture treats memory like a feature you turn on, you will pay for it twice: once in token spend, and once in the slow erosion of consistency that has your users playing Henry Roth, re-establishing context every morning so Lucy can function. And yes, I often use humorous analogies, so please subscribe or follow (or un-) according to your tastes.


If you have not seen 50 First Dates, the premise is that Lucy Whitmore (Drew Barrymore) wakes up every day with no memory of anything that happened the day before, and Henry Roth (Adam Sandler) has to remind her of their entire relationship, every morning, forever. Sweet movie. Terrible AI pattern (in most cases).

True story: when I went to see this one in the theater, the projector died about twenty minutes in. It was weeks before we made it back to finish it, and the second viewing had this faint déjà vu quality, the film meeting me halfway while I reconstructed the rest from a partial memory. Something humans do automatically (if unreliably) and LLMs can’t, at least on their own.

The movie plot is also a reasonable analogy of how a Large Language Model works under the hood. The LLM is Lucy. Every developer who builds on top of it is Henry. Every API call is the first call. Every conversation is reconstructed from a transcript that the application hands the model on the way in. The model itself remembers nothing. The illusion of continuity is something your application is doing on its behalf, on every turn, at your expense.

Most teams do not build for this. They build as if “the AI” remembers things, get surprised when it doesn’t, bolt on a memory layer that is tested like a deterministic automation, and then watch their token bill quietly compound. We’ve all heard some horror stories about this happening. It’s why enterprises prefer to use vendor tools and outside consultants. Which is a good way to get up and running, but has its own cost if the relationship isn’t built on trust and reciprocal ROI.

The Architecture Reality Behind the Humorous Analogy

LLMs are stateless. Full stop. The model is a function: tokens in, tokens out. Whatever “memory” you experience in ChatGPT, Claude, Gemini, or your own agent is some other system managing the flow of prior context back into the prompt before the model sees it.

This has three implications that drive everything else:

First, there is no “the conversation.” There is a transcript that gets re-sent every turn. The model is not pulling up your last message; you are handing it back, every time.

Second, the context window is the entire universe of what the model knows in that moment. Anything not in that window does not exist. Anything in that window is being paid for, in tokens, on every single call.

Third, “memory” in vendor marketing rarely means one thing. It is a category that includes at least five different mechanisms with different costs, different failure modes, and different retrieval semantics. Conflating them is how you end up with an expensive system that still forgets the user’s name. There are, however, better ways.

Memory Is a Marketing Word

When a vendor or framework says “memory,” they could mean any of the following, and the differences matter:

Conversation history replay. The full transcript, prepended on every call. Simple, perfect recall, terrible cost curve. Linear in turns, eventually crashes into your context limit.

Running summary. A compacted version of the transcript, regenerated periodically. Cheaper, lossy, drifts over time. The model is now reading its own paraphrase of what happened, with all the small infidelities that implies.

Vector retrieval (RAG over chat history). Past turns are embedded and indexed; only relevant snippets get pulled into the next prompt. Cheap, scalable, but only as good as your embeddings and your retrieval thresholds. It will confidently fail to surface the one thing the user expected it to remember.

Structured profile / entity store. Key-value or graph storage of facts about the user, product, or domain (“user’s tone preference: dry,” “preferred billing currency: USD”). Cheap to read, easy to audit, but only as good as the extraction logic that populates it.

Procedural / skill memory. Instructions, playbooks, or skills the agent loads on demand. Closer to “here is how we do things here” than “here is what you said yesterday.” Different beast entirely.

A reliable and practical AI memory architecture uses several of these in combination. A bad one picks one and pretends it covers everything. If your team is having an argument about “should we add memory,” the real argument is which of these five you are talking about; why it is the best choice in a given context; and when the context and best option changes.

What Lost in the Middle Actually Costs You

Even if you stuff the entire history into the context window, you do not get what you think you are paying for. Liu et al. at Stanford published Lost in the Middle: How Language Models Use Long Contexts in 2023, and the finding has been replicated enough times that it should be a load-bearing assumption in any architecture: model attention is not uniform across the context window. Information at the beginning and end gets used. Information in the middle gets quietly ignored, even by models that advertise long-context support.

So the naive “just give it the whole history” approach is doubly bad. You pay for every token, and the model uses some of them less than others, and you have no easy way to tell which.

This is one of the reasons selective retrieval beats full replay almost everywhere. You are not just saving tokens. You are putting the relevant tokens in positions where the model will actually use them.

The Token Bill (Yes, Again)

Here is the part that gets glossed over in the demos.

Every token in your context window is paid for, every turn. If your “memory” is “we keep prepending the full conversation,” then by turn 50 you are paying for tokens 1 through 49 fifty times over, and the model is working harder to find the signal each time. This is the closest thing to a structural cost trap in LLM architecture, and it is almost always invisible in development because nobody runs 50-turn conversations against the dev key.

Anthropic’s prompt caching, introduced in August 2024, helps for the parts of your context that genuinely repeat (system prompts, fixed instructions, large reference documents): cached read tokens cost about 10% of the standard input price. That is real money saved on the parts that don’t change. But caching is not memory. It does not summarize, retrieve, or forget. It just makes paying for the same prefix cheaper. Use it where it fits, but do not let “we turned on caching” stand in for an actual memory strategy.

Memory architecture is cost architecture. They are the same conversation. Any team treating them separately is going to be surprised by one of them.

Patterns That Actually Earn Their Keep

A few that hold up in production (as of this writing, a caveat that I’m guilty of not always stating, and how you should think about everything you read about AI):

Hierarchical / paged memory. MemGPT (Packer et al., 2023) is the canonical paper here: a small “main context” of hot facts plus a larger “external context” the model can page in and out, modeled on operating-system virtual memory. Even if you never use the framework (now continued as Letta), the mental model is the right one. Most context is cold most of the time. Stop paying to keep it warm.

Compaction at boundaries. Summarize aggressively at natural breakpoints (session end, topic change, day rollover). Throw away the verbatim transcript once the structured summary is written. Track what got compacted so you can audit later if a user complains the model “forgot.”

Structured extraction over raw recall. Pull stable facts (preferences, identifiers, decisions) out of conversation into a structured store. Read those on every turn. Let the conversational history age out. The user’s preferred tone of voice does not need to live in 12,000 tokens of transcript.

Retrieval over replay. Index past turns, retrieve only what is relevant to the current input, accept the occasional miss as a cost of doing business. Tune your retrieval thresholds with the same seriousness you tune any other production query.

Skills and procedural memory as a separate tier. “How we do things” is not the same as “what we said.” Keep them in separate stores with separate update rules. Skills change rarely; episodic facts change constantly.

A Practical Framework

Four scenarios, four answers:

A user opens the same chat tomorrow and expects continuity: structured profile plus retrieval over summarized history. Do not replay the full transcript.

An agent loops on a long-running task: hierarchical memory with compaction at step boundaries. Hot working set stays small; cold context pages out.

A system prompt or large reference document is reused on every call: prompt caching. Cheap, easy, do it today.

A model needs to “know how we do things”: procedural / skill memory in its own tier. Keep it separate from episodic memory so updating one doesn’t disturb the other.

The wrong answer in all four cases is “just send the whole history.” That is the architecture equivalent of walking Lucy through the entire relationship from scratch, every morning, in hopes that this time some of it sticks. Romantic in the movie. Expensive in production.

Paddling off into the Sunset

The model forgets. That is not a bug, that is the current limitation of the art. The work is in deciding what your application remembers, where it stores it, when it retrieves it, and what it costs you per turn. Treat memory as architecture and most of the surprises go away.


Sources:

If you found this interesting, please share.

© Scott S. Nelson

Markup is the New Markdown

TL;DR: “HTML is the new Markdown” is an attention-grabbing headline (for some of us), but not something to adopt at face-value, or without more context. Where HTML applies, it genuinely delivers. Where it doesn’t, you’ll just be paying more per token for the privilege of being wrong.


Here’s something I catch myself doing constantly with AI content: skim the headline, fill in the details based on my own context, and run with a conclusion the original post may or may not have intended. I’m not the only one. It’s not laziness, it’s a cognitive defense mechanism in a world full of content, not to mention extra work hours keeping up with all the tools that are supposed to save us work hours.

This one is definitely that.

Earlier this month, Thariq Shihipar, an engineer at Anthropic, posted nine words on X: “HTML is the new markdown.” The post linked to a companion site with 20 self-contained .html files that an agent produced instead of the usual Markdown output. It pulled 8,600+ likes and 11,000 bookmarks. Simon Willison publicly reconsidered his three-year Markdown default. The Hacker News thread was climbing past 30 points an hour.

The reaction was big and fast, which is a sign that many people didn’t read the whole thing, or understand the whole context (ahem, “vibe coding).

So. Let’s look at what this shift actually means, where it applies, how to apply it, and what it does to your token bill.

My HTML Baggage (Relevant, I Promise)

I mastered HTML in the early 2000s. Semantic structure, tag vocabulary, clean markup from scratch. The whole deal. So when Markdown started getting traction among developers in the 2010s, my first reaction was skepticism: why learn a format whose entire job is to produce a subset of what I can already write directly?

The pragmatic case eventually won me over, and it wasn’t even close. By the early 2020s, documentation had moved decisively into Git. Design docs, specs, ADRs, READMEs, changelogs: all .md. GitHub renders it natively. It reads as plain text, commits cleanly, and it became the shared syntax of developer collaboration. Fighting it meant fighting a current that wasn’t going to reverse, so I stopped. The ubiquity was the feature.

I tell you that because when I saw Thariq’s post, I had already settled in mind that markdown is how to communicate with AI and this made no sense to me.  Back to that bad habit of skimming headlines. What I should have done first was ask: back for what?

What Thariq Was Actually Pointing At

The argument isn’t that Markdown is dead or that you should rewrite your documentation in HTML. It’s narrower and more specific than that.

Thariq’s 20 examples grouped HTML wins into categories of LLM output: project status reports, code reviews, diagnostic summaries, data comparisons. The things an agent produces that a human then has to read, navigate, and act on. When one researcher ran all 20 prompts through Claude in both formats, HTML won 17 of the 20 head-to-head comparisons. The 3 cases where Markdown held its own were tasks where the output stays internal to an agent’s loop and never reaches a human at all. (Source.)

Once a person is the end consumer, HTML’s richer vocabulary starts earning its overhead. Collapsible sections. Semantic structure. Tabbed layouts. Inline labels. Color-coded status. Things Markdown has no syntax for, because Markdown was never designed to produce navigable deliverables. It was designed to produce readable plain text.

LLMs have also been trained on billions of HTML pages, so the semantics of those tags are deeply embedded in how these models understand and produce structure. That doesn’t go away just because Markdown became the default output convention.

For human-readable LLM output, HTML deserves a serious look. That part of the headline holds up.

Where It Does Not Apply

This is where the skimming gets expensive.

For input to an LLM, Markdown is still the right default, and by a wide margin. Markdown uses dramatically fewer tokens than HTML for equivalent content. A Cloudflare analysis found that the Markdown version of a typical blog post used 80% fewer tokens than its HTML counterpart. In RAG pipelines, Markdown-formatted inputs have been shown to boost accuracy by up to 35% while cutting token costs by 20 to 30%. On structured tasks like table extraction, Markdown outperforms HTML at roughly 60.7% accuracy versus 53.6% in GPT-based evaluations. (Source.)

Worth noting: Profound ran a controlled experiment across 381 pages on 6 websites to test whether serving Markdown to AI crawlers versus HTML made a meaningful difference in bot traffic. The result was a marginal directional advantage for Markdown (~16% mean lift) that wasn’t statistically significant. (Source.) Which is to say, well-formed HTML isn’t incomprehensible to LLMs. But when you’re paying per token, the math still favors Markdown clearly.

For documentation in repositories, nothing about Thariq’s observation changes the picture. Markdown’s native rendering in GitHub and GitLab, its readability as plain text, and its role as the standard syntax of developer documentation are not touched by this argument. If your docs live in Git and humans need to read and edit them, Markdown is still the answer. Full stop.

The Token Bill Reality

This deserves its own section because it’s where the “HTML is back!” take gets most dangerous most quickly.

The token efficiency gap between Markdown and HTML is real and large. 80% fewer tokens for equivalent content isn’t a rounding error. At any meaningful scale, that’s a direct line to your API costs. HTML earns that overhead only when the output is rich enough, and human-facing enough, to justify it.

If your workflow involves long context windows, high-volume RAG retrieval, or large amounts of text being ingested or passed between agents, the format you choose for that content has a real cost consequence. Thariq’s post is not an argument for switching to HTML across the board. Applied without that nuance, it’s an expensive misread.

The Framework That Actually Helps

Four scenarios, four answers:

A human writes context and feeds it to a model: Markdown. A model produces output that stays inside an agent loop: Markdown. A model produces a deliverable a human will read, navigate, and act on: HTML is worth the token cost. Documentation lives in a repository: Markdown, full stop.

The headline “HTML is the new Markdown” is accurate for exactly one of those four. The other three haven’t changed.

Thariq’s post isn’t a verdict. It’s a recalibration for a specific use case. The fact that it spread the way it did says less about the content and more about how hungry people are for permission to do the thing they already half-wanted to do.

I’m not pointing fingers. I had the same instinct.

Additional Sources:

If you found this interesting, please share.

© Scott S. Nelson