The Staircase Problem: Why Better AI Tools Make Bad Processes Worse

I used Claude (Anthropic) to help research and write this piece. The analysis and perspective are mine, but Claude did the heavy lifting on synthesizing industry data and making my draft readable.

Every large organization I talk to has done the same thing over the past year. They rolled out AI tools to everyone. ChatGPT, Claude, Copilot, internal wrappers around the same models. The intent was straightforward: make people faster at what they already do. And it worked. People got faster at what they already do.

That’s the problem.

Giving everyone an AI assistant is fixing one step on a staircase that was built for a world that no longer exists. The step works better now. But the staircase still spirals when it could go straight up. And because no one redesigned the staircase, the faster step just creates a pileup on the next one.

Software engineers are the first job family to experience this at full force. The tools they have today, Claude Code, Codex, Cursor, are not incremental. They are transformational. And the lesson from what happened to engineering teams is urgent, because every other knowledge work function is next. Investment research. Compliance. Portfolio management. The pattern is the same. The clock is ticking.

1. The Paradox: Faster Code, Slower Teams

2025 was the year of broken promises

By mid-2025, the vast majority of engineering teams were using AI in their workflows. Nearly half of companies reported that most of their code was AI-generated. Andrej Karpathy coined “vibe coding” to describe a new mode where developers express intent in natural language and let AI handle the rest. The adoption curve was steep and real.

But the system-level results told a different story. METR ran a rigorous randomized controlled trial with experienced open-source developers completing real tasks and found that AI tools actually made them 19% slower, despite those same developers believing they were faster. The gap between perceived and actual productivity was one of the most important findings of the year.

Bain & Company explained why. Writing and testing code accounts for only 25-35% of the time from idea to product launch. Speeding up that slice does almost nothing when everything upstream and downstream remains bottlenecked.

The code review crisis was the canary

The bottleneck shifted to code review almost overnight. Teams were merging dramatically more pull requests, but review times nearly doubled because the AI-generated code was larger, harder to follow, and riddled with subtle issues. Senior engineers spent several times longer reviewing AI suggestions than human-written code. Google’s DORA 2025 report made it official: AI adoption had a negative relationship with software delivery stability, even as throughput increased.

The human cost was just as real. Developers using AI reported saving significant time writing code, yet most saw no decrease in overall workload. The time saved was consumed by coordination overhead, context switching, and managing higher volumes of changes. Positive sentiment toward AI tools dropped markedly in Stack Overflow’s annual survey. The “terrible work experience” problem wasn’t hypothetical. It was documented across every major developer survey last year.

2026 solved the tool problem. The process problem got worse.

This is where the story takes a turn that most people miss.

The 2026 model generation changed the equation. Opus 4.5 delivered state-of-the-art coding performance with consistent quality through 30-minute autonomous sessions. Opus 4.6, released in February 2026, went further: it set the highest score ever recorded on Terminal-Bench 2.0 and, crucially, can detect and correct its own mistakes during code review, a weakness in every prior generation. The “context rot” problem where AI lost coherence over long sessions was effectively eliminated. Claude Code now supports agent teams that collaborate in parallel, and practitioners report that cross-model workflows (Claude Code and Codex together) produce better results than either alone.

The review ecosystem matured in parallel. Cursor acquired Graphite for a reason: code review was eating a growing share of developer time as the time writing code kept shrinking. A new generation of system-aware AI reviewers emerged to address exactly this.

So the code quality problem is largely solved. The review tooling is catching up fast. And yet. The process problem hasn’t budged.

Sprint ceremonies designed for two-week human work cycles don’t make sense when AI generates features in hours. Standups where engineers report they “prompted Claude Code” don’t carry useful signal. QA processes built for human-pace throughput can’t absorb the volume. Team structures optimized for specialization break down when one person plus AI can do what three people did before. Deployment pipelines gated on manual approvals become the new bottleneck the moment everything upstream speeds up.

In 2025, organizations could blame the tools. In 2026, there’s nothing left to blame but the process.

2. The Pattern: A Sequence That Repeats

Five steps, every time

The software engineering experience reveals a transformation sequence that I believe applies to every knowledge work domain. It goes like this:

First, an AI tool accelerates a discrete task. Code writing. Earnings call summarization. Transaction flagging.

Second, the bottleneck shifts to adjacent steps that were never designed for the new throughput. Code review. Insight synthesis. Exception handling.

Third, the process requires fundamental redesign. Sprints give way to work cycles measured in hours, not weeks. Periodic compliance reviews give way to continuous monitoring. Manual analysis gives way to agent orchestration.

Fourth, team structures change. Roles merge, new roles emerge, headcount shifts. Developers become intent engineers who guide agents with clear objectives. Analysts become insight curators. Compliance officers become exception specialists.

Fifth, measurement frameworks need rebuilding. Lines of code, reports produced, cases processed: none of these capture value in the new model.

Some teams are already on the other side

The organizations extracting real value share a common trait: they redesigned the process around AI, not the other way around.

AWS published its AI-Driven Development Lifecycle, which replaces traditional sprints with “bolts,” work cycles measured in hours or days. The methodology positions AI as a central collaborator rather than an assistant. AWS was explicit about the motivation: simply retrofitting AI as an assistant constrains its capabilities and reinforces outdated inefficiencies.

GitHub released its open-source Spec Kit, placing specifications rather than code at the center of engineering. The philosophy: humans should review specs, plans, and acceptance criteria, not 500-line diffs. Human judgment moves from “Did you write this correctly?” to “Are we solving the right problem?”

Goldman Sachs offers the most complete financial services example. They integrated AI into their internal development platform, fine-tuned on their codebase and documentation, and are now piloting autonomous coding agents. CEO David Solomon noted that AI can complete 95% of an IPO prospectus in minutes, work that previously required a six-person team over two weeks. The remaining 5% is where human judgment lives.

The startup advantage matters here. AI-native startups design processes around AI from day one because they have no legacy to protect. Large enterprises face a harder problem: three out of four companies say the hardest part isn’t the technology. It’s getting people to change how they work.

Investment research is next

The same pattern is starting to play out in financial services, and the parallels are striking.

In investment research, the “just accelerating” version looks like this: using ChatGPT to summarize earnings calls faster, auto-extracting data from SEC filings, drafting research notes from templates. Harvard Business School research from 2025 found that AI-generated articles on Seeking Alpha were actually less informative than human articles, even though they expanded stock coverage. Speed without rethinking produces volume without insight.

The “rethinking the staircase” version looks different. AlphaSense launched Deep Research, an agent-based system producing analyst-level output from hundreds of millions of premium documents in minutes. Hebbia’s Matrix platform uses multi-agent orchestration to save investment bankers dozens of hours per deal. JPMorgan built a multi-agent system that automates complex investment research end-to-end.

The bottleneck in investment research is already shifting from data gathering and note drafting (the “code writing” equivalent) to insight synthesis and judgment (the “code review” equivalent). Firms that just accelerate the first step will drown analysts in AI-generated research they can’t process.

The same dynamic applies to compliance, portfolio management, and client reporting. In each case, the “accelerating” version is easy to spot: more alerts, faster summaries, auto-generated memos. And in each case, the real transformation requires rethinking the entire workflow, not just the step AI can speed up.

McKinsey estimates that a mid-sized asset manager could capture a quarter to nearly half of its total cost base through AI-enabled workflow reimagination. But technology spending has grown steadily for years with virtually no correlation between higher spend and improved productivity. Firms added headcount to manage complexity rather than redesigning the processes creating it.

3. The Playbook: Redesign the Staircase

Start with the bottleneck, not the tool

McKinsey’s 2025 analysis surfaced a striking gap: only 6% of organizations report meaningful bottom-line impact from AI, while 88% use it regularly. That’s not a technology gap. It’s a process design gap.

The organizations that captured value share a clear pattern. Strong senior leadership ownership. Process redesign rather than bolt-on adoption. Human-in-the-loop validation. And portfolio discipline, scaling a few high-impact use cases fully before expanding. The firms that tried hundreds of disconnected AI experiments captured nothing. The ones that picked a handful of functions and transformed them end-to-end captured real value.

Deloitte’s 2026 survey of over 3,000 leaders confirmed that organizations investing in change management significantly outperform those that don’t. And here’s a counterintuitive finding: high-achieving AI organizations report far more anxiety about the transformation than low achievers. Bold transformation paired with real support structures outperforms comfortable incrementalism.

Learn from engineering, apply everywhere

The software engineering transformation gives every other function a playbook. Not a playbook of what tools to buy. A playbook of what questions to ask.

Where is the equivalent of “code writing” in this function, the discrete task that AI can accelerate today? Where will the bottleneck shift once that task speeds up? What would the process look like if we designed it around AI from scratch, with no legacy constraints? What new roles emerge, and what existing roles need to evolve? What are we measuring today that will become meaningless, and what should we measure instead?

These questions are uncomfortable because they don’t have neat answers. They require leaders who are willing to admit that processes they built and optimized for years are now the constraint, not the solution. That takes a specific kind of courage.

The clock is real

The DORA 2025 report offered the sharpest insight for anyone navigating this: AI magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.

Organizations with strong processes, clear ownership, and robust feedback loops will see AI amplify those qualities. Organizations with fragmented workflows, siloed data, and misaligned incentives will see AI amplify those problems too. Faster.

The firms that win won’t be the ones that deployed the most AI tools. They’ll be the ones that had the honesty to look at their staircase, admit it goes to the wrong floor, and redesign it while everyone else was polishing the steps.

That redesign starts now. Not with a new tool. With a harder question: what would this process look like if we built it today, knowing what AI can do?

The answer is almost never “the same thing, but faster.”

References

METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity”
Bain & Company, “From Pilots to Payoff: Generative AI in Software Development”
Google, “2025 DORA Report”
AWS, “AI-Driven Development Life Cycle: Reimagining Software Engineering”
GitHub, “Spec Kit” (open source)
Anthropic, “Introducing Claude Opus 4.5” and “What’s New in Claude 4.6”
McKinsey, “How AI Could Reshape the Economics of the Asset Management Industry”
McKinsey, “The State of AI in 2025”
Deloitte, “The State of AI in the Enterprise, 2026”
Harvard Business School, “AI Can Churn Out Financial Advice, But Does It Help Investors?”
Addy Osmani, “The 80% Problem in Agentic Coding”
Ankit Jain, “How to Kill the Code Review” (Latent Space)