Friction and Flow
On AI failure, organizational scale, and the question nobody is asking yet
The most interesting thing about how people talk about AI failure isn't the failure. It's the satisfaction.
The Aha Moment
When an AI hallucinates — produces something confidently wrong, fabricates a citation, misremembers a fact — a particular kind of person has a particular kind of reaction. Not disappointment, exactly. More like vindication. Aha. See, it can't be trusted. It has limitations. It breaks down.
What's happening technically is fairly well understood. Large language models fail at the edges of their training and context window. Push any system past its operational envelope and it degrades — that's not a defect specific to AI, it's how capability works. Every system has a failure boundary. Surgeons have them. Financial models have them. Bridges have them. We have generally learned to work within those boundaries rather than treating the existence of a boundary as proof of fundamental unreliability.
With AI, we don't. One hallucination becomes a verdict. This is worth noting not because AI deserves more generous treatment, but because the inconsistency of the standard reveals something about what's really driving the criticism — which is not a careful assessment of reliability but something closer to relief. The thing that was making us nervous can be dismissed after all.
The Wrong Scale
The deeper problem with the hallucination critique is that it applies individual-tool standards to something that is already, in practice, operating at organizational scale.
When AI is integrated into a serious workflow — not as a novelty but as a dependency — it isn't functioning as a single instrument that either works or doesn't. It's functioning as a layer in a system, the way a department or a team functions as a layer. You don't evaluate a department by asking whether any individual in it ever made a mistake. You evaluate it by whether the system as a whole produces reliable output over time, handles errors without catastrophic failure, and improves when given feedback.
By those standards, the relevant comparison isn't "does AI hallucinate" but "does it hallucinate more than the human processes it's replacing or augmenting." And that comparison tends to go unasked, because the human processes in question also fail constantly, expensively, and often without anyone declaring them fundamentally unreliable.
At organizational scale, the question of AI's viability isn't about individual failure rates. It's about what kinds of organizations AI can be part of, at what size, doing what kinds of work, and with what failure modes. That's a much more interesting question and we've barely started asking it.
What Actually Breaks Human Organizations
We have decades of research on large human organizations and a fairly clear picture of why they fail. The structural designs — hierarchies, information chains, decision layers, escalation paths — are mostly sound. The pathologies are behavioral. Department bloat, where units grow to protect their budgets rather than serve their function. Bureaucratic creep, where process accumulates until it becomes the primary output. Promotion by politics rather than competence. Information filtered at every layer by people protecting their position. Corruption as a feature rather than a bug of power concentration.
These are not structural failures. They are human failures wearing structural clothing. The org chart didn't corrupt anyone. The incentive to corrupt was already there, and the structure provided the means.
The naive conclusion from this — and it is tempting — is that AI-native organizations would simply not have these problems. AI doesn't have political survival instincts. It doesn't promote its allies, protect its budget, or filter information upward to make itself look good. Strip out the human vices, the reasoning goes, and you get the organizational structure running cleanly at last, the way it was always meant to.
This reasoning is not wrong as far as it goes. It just doesn't go far enough.
What AI Already Knows About Friction
The most successful AI training paradigms of the past decade have, independently of any organizational theory, converged on a striking finding: adversarial pressure produces better results than cooperative optimization alone.
Generative Adversarial Networks — GANs — work by pitting two systems against each other. A generator tries to produce convincing outputs; a discriminator tries to catch it failing. Neither improves without the other's resistance. Remove the adversary and the generator produces mediocre outputs with high confidence. The friction is not incidental to the process. It is the process.
Multi-agent debate takes a similar principle into language models. When multiple AI instances are made to argue opposing positions on the same question — rather than simply answering it — the resulting answers are measurably more accurate and better reasoned than those produced by any single instance working alone. The disagreement improves the output. Consensus, in this context, is a failure mode.
And then there is self-play: AlphaGo and AlphaZero didn't learn to master their respective games by studying human play. They learned by playing millions of games against themselves — internal adversarial pressure, sustained at a scale no human opponent could provide. The result was performance that exceeded everything humans had produced over centuries of accumulated mastery.
AI, left to its own training logic, keeps arriving at the same answer: opposition works. Friction is sometimes not the obstacle to learning. It is the mechanism.
The Other Direction
But friction is not the whole story, and the evidence on the other side is just as substantial.
Elinor Ostrom won the Nobel Prize in Economics in 2009 for documenting something that formal economic theory had insisted couldn't exist: large groups of people successfully governing shared resources over long periods of time, without markets and without central authority. No adversarial pressure, no hierarchy enforcing compliance. Just coordination norms that emerged from the community itself and held because the community trusted them. Stable, productive, durable. The theory said this was impossible. The empirics said it happened constantly.
Open source software development produces something similar at a different scale. Linux, Wikipedia, and most of the foundational infrastructure of the modern internet were built by distributed communities of people contributing voluntarily, coordinated by shared norms rather than command structures, without adversarial tension between the contributors. The outputs rival and in many cases exceed what comparable centralized, hierarchical organizations produced with vastly more resources.
Swarm intelligence offers a third data point. Ant colonies, bee hives, flocking birds — these systems produce emergent behaviors of remarkable sophistication without central coordination and without adversarial dynamics between agents. Each individual follows simple local rules; the collective output is complex, adaptive, and resilient. The result is achieved not through friction but through something closer to pure flow: each agent doing its small thing, the system doing something much larger than the sum of those things.
So we have two bodies of evidence, both credible, pointing in opposite directions. Friction produces better outputs. Flow produces better outputs. Both statements are empirically supported. The question is not which one is right.
Friction and Flow
The question is what determines which one a given situation calls for.
Some patterns suggest themselves. Adversarial dynamics seem to produce the best results when the task involves discrimination — distinguishing true from false, good from bad, correct from incorrect. GANs improve at generation because the discriminator is constantly forcing the generator to be more precise. Multi-agent debate improves reasoning because disagreement surfaces the weaknesses in each position. Opposition is a form of quality control, and quality control is most valuable when the cost of being wrong is high and hard to detect.
Flow and coordination — the Ostrom model, the open-source model, the swarm model — seem to perform best when the task involves construction rather than discrimination. Building something new, expanding a knowledge base, exploring a solution space. Here, adversarial pressure can be counterproductive: it narrows the search rather than broadening it, optimizes locally rather than globally, and produces confidence in the wrong things. What these situations call for is trust and a shared direction, not opposition.
If that pattern holds — and it is a hypothesis, not a conclusion — then the design question for AI-native organizations is not "how much friction should we have" but "what is this part of the organization trying to do, and which mode does that work call for." The same organization might need adversarial structures for its verification and decision-making layers, and flow-based structures for its generative and exploratory layers. Designing that deliberately, rather than defaulting to one mode throughout, is probably closer to what organizational excellence looks like at AI scale.
Whether AI systems can make that determination themselves — recognizing when they need friction and when they need flow, and switching between modes accordingly — is a question that hasn't been seriously addressed yet. Current multi-agent architectures are mostly basic hierarchies: simple tree structures that reflect how we already think about software and tools. They are first-generation designs, reasonable starting points, and almost certainly not where this ends up.
The Experiment
The hallucination critique, revisited from here, looks like a category error. We are judging the reliability of something at the individual-tool level while it is already beginning to operate at a level where the relevant unit of analysis is the organization. And at that level, the question of whether any single component fails is less important than the question of how the system is structured to handle failure — and whether the structure knows the difference between the friction that wastes and the friction that works.
We are, without quite acknowledging it, at the beginning of the largest organizational experiment in history. AI-native organizations will be built. They will fail in ways we didn't predict and succeed in ways we didn't expect. The naive version will try to remove all friction and produce something efficient and brittle. The more interesting version will have to figure out what the friction was for.
How much of what slows us down is also part of what we can accomplish? That question doesn't have an answer yet. It may be the most important one on the table.