← Back to Writing

On Determinism, Hydrogen Atoms, and AI Agents

#ai#agents#determinism#physics

An open thread on determinism, AI agents, and my ability to care.

Had a conversation with a friend today that I’ve been chewing on ever since. Writing it down before it slips, because the more I think about it the more I think there’s something worth pulling on.

He opened with what sounded like a simple riddle. Can you think of a way to get a deterministic answer out of a non-deterministic system? Then he gave me the setup. Imagine a quantum random number generator sitting inside a black box. Truly quantum, one that isn’t ultimately reducible to some hidden variable you could in principle reverse-engineer. Is there any way to predict, with 100% certainty, what number it’s going to spit out?

I sat with that for a minute. My first answer was the boring one. No. That’s the whole point of quantum randomness. That’s what makes it interesting as a primitive. If you could predict it, it wouldn’t be doing the job we built it to do.

But that’s the boring answer, and based on the sly grin on his face, he was clearly going somewhere with this, so I humored him. I often like to think of problems or systems in terms of analogies to existing real-world examples or other real-world systems. The first analogy I could think of was the electron in a hydrogen atom. We don’t know where the electron is at any given moment. We can’t. The wavefunction tells us where it could be, with what probability density, and out of that we get these beautiful orbital shapes that map the regions of likely presence. But the electron’s specific location is undefined until we measure it. So in a way, it’s non-deterministic. We can’t tell you where the electron is.

But also, in another way, we kind of can? We can tell you with absolute certainty where it’s allowed to be. The wavefunction itself is deterministic. Schrödinger’s equation is a perfectly well-behaved partial differential equation. The randomness only enters at the moment of measurement, when the system has to pick one of the eigenstates we’ve set up to measure against.

That felt like a useful reframe. The system isn’t fully random. The space of outcomes is constrained, knowable, and in a meaningful sense, deterministic. The collapse is what’s stochastic.

He let me ramble on about that, and then he reframed the whole question. What if we don’t actually need to know which specific number the QRNG generates? What if we only need to know that it falls within an acceptable range?

I’ll admit I had to sit with that one too, because it sounds almost too simple to be the point. But the more I turned it over, the more I realized that was the point. If our QRNG is bounded so that it can only output integers between 0 and 100, we can put a hook on the output and ask: is the number between 0 and 100? And the answer, every time, is yes. That answer is deterministic. Not the number itself. The verdict on the number.

We haven’t tamed the quantum-ness of the system, we just simply stopped caring about the uninteresting black-box portion and focused on what is acceptable.

The old adage ‘never let perfect be the enemy of good’ comes to mind.

The pivot to AI

This is where it got interesting, because he pivoted to AI agents. We spend a lot of time talking about how LLMs are non-deterministic. We can enter the same prompt, but get different outputs. Temperature, sampling, weights, how close we are to the vernal equinox (/s), all the stuff that makes the model interesting also makes it unreliable from a classical software engineering standpoint. And there’s been this whole quiet anxiety around it. How do we build production systems on top of something that won’t give us the same answer twice?

He and I actually disagree a little bit with where the future of AI agents is headed. He believes AI agents are the future to damn near everything that can be done with AI. I don’t believe people will be as quick to give full access to these agents that will almost certainly make wanton decisions and actions, because I do not trust that the vast majority of people will know how to create the proper guardrails.

His suggestion, and the thing I want to chew on, is that maybe we’ve been trying to solve the wrong problem. Maybe the goal was never to make the model deterministic. Maybe the goal is to wrap the model in tooling that makes the output deterministic in the ways that actually matter? Hooks. Gates. Validators. Schemas. The same trick as the set range for the QRNG, applied to inference.

This resonated with me. I have been looking for a reason to get excited about AI, agents (I hate this phrase, makes me cringe for some reason), agentic tooling, etc. It’s clear to me that the future of software engineering will be the clustering, orchestration, and automation of AI agents. It’s what I imagine the container boom in the early to mid 2010s was like when Kubernetes was released and the container ecosystem began to take off. To me, the AI models/tools themselves have been a great boon to my productivity, which is what I have enjoyed about them. I do not find the models/tools themselves to be interesting.

Imagine a scenario in which a new magical tool comes out that a master carpenter can use. This magical new tool can be used to create craftsmanship in an unfathomably fast way, and with a quality that doesn’t drop, given the proper amount of time, care, and direction. The master carpenter would be WAY more interested in producing the craftsmanship with this magical tool, rather than studying the magical tool, would they not? Imagine you’ve had all of these ideas you’ve wanted to build your entire life, and all of a sudden a magical, flying multi-tool comes along that makes all of that possible. You then start to slowly realize that in order to stay relevant, you have to learn how to make a lot of these magical, flying multi-tools work together, and autonomously.

Not particularly fun, compared to all of the beautiful cathedrals you’ve ever hoped and dreamed of creating.

That’s where I was, before thinking about the ‘pattern’.

The pattern that will potentially make me care

I took significantly more CEG/EE courses in college than I did CS/SWE classes, but I still learned about nondeterministic polynomial time functions.

I did some digging. SAT solvers work this way. The search through the solution space is intractable and basically a non-deterministic mess, but verifying a known solution is cheap and exact. Hash mining works this way. The hash itself is a black box of pseudorandomness, but we don’t care about the hash, we care whether it falls below a target.

The entire NP complexity class is built on this asymmetry - generating a correct answer can be hard, but verifying one is easy. The whole P vs NP question is essentially asking how deep we want to go.

And it maps cleanly onto agent engineering. When Claude Code writes a function and then runs the tests, that’s the pattern. The model generates whatever bullshit it’s going to generate, and the test suite verifies against it. The model is non-deterministic. The tests aren’t. The system as a whole produces working code - not because the model is reliable, but because the harness around it can be engineered to reject everything that isn’t.

When we use constrained decoding to force a model to emit valid JSON, we’re doing the same thing one layer deeper. We’re not asking the model “pretty please produce valid JSON”, we’re modifying the sampling step itself so it literally cannot emit tokens that would violate the schema. The collapse is happening inside a basis I chose.

That last part is where my earlier physics analogy feels even more relevant. In a Stern-Gerlach experiment, when you measure the spin of an electron along a particular axis, you’re not learning some pre-existing fact about the electron’s spin. You’re forcing it to pick one of two eigenstates relative to the axis you chose. The choice of measurement is the choice of basis, and the basis determines what answers are even available. That is exactly what a JSON schema or a grammar-constrained decoder is doing to the model. We still haven’t made the underlying process deterministic… but we’ve projected it onto a decision space where every possible outcome is one you can act on.

Now, there is a caveat I want to acknowledge. This is an incredibly simple way to look at it when the question is “Has the model generated valid JSON?”. That has a clear and obvious answer. I can’t, however, generate a test for “is this picture I drew beautiful?” (It would be easy to answer if I were the one who drew it, as I struggle to produce even acceptable stick-figure art, but I digress.)

Where I land, at least for tonight

I have pretty consistently held the view “LLMs are non-deterministic, which is a flaw that almost certainly will not be fixed any time soon”, which has made my interest in “AI Agents” virtually zero. I do recognize, however, that the future of software engineering (and most, if not all white-collar work) will be dominated by “AI agents” (puke). So I have to find a reason to care if I want to stay relevant.

I then took a second to really flesh out why I have been uninterested in AI agents. If they’re non-deterministic, I do not trust them. If I do not trust them, I will not use them, and therefore I’m uninterested in the ecosystem around them. Then I thought to myself - if I feel this way, other people also probably feel this way. I bet if you asked any IT professional who was around circa 2015 whether or not they regret not caring about Kubernetes, I’d wager that they would say yes, emphatically.

So maybe I can use the motivation of not wanting to be left behind to drive me to change my mind, and find an interesting problem space (ie. figuring out a way to create deterministic outputs from AI agents) to fuel the foray into the ecosystem.

I’m not going to sit here and pretend that my friend and I were the first pair of people to have this discussion. We certainly are not. The ‘LLMs are non deterministic’ flaw is not a novel epiphany.

I’m also not going to sit here and pretend that I have come to some sort of profound conclusion that nobody else has about ‘The reliability of the system coming from the harness, not the model’. That’s not the point. It does give me a foundation to start from though, and hopefully I can use it to drive my interest in AI agents because I might be able to build a solution that does interest me.

The interesting engineering work, to me, isn’t prompt-tuning the model into reliability through the turn-loops. It’s not even figuring out how to turn the harnesses into a software engineering tool. It’s designing the constraints around it. The schemas. The test suites. The validators. The hooks. The sanity checks. The bounded action spaces. The whole craft of building reliable software on top of unreliable inference is, in some sense, the craft of designing good measurements for a quantum system. You pick a basis, which then determines what you can know. The non-determinism stays where it is (and I don’t have to care about it), and you build the rest of your system to be okay with that.

Ironically, I will probably now feed this blog post to an AI agent and say “Hey, I’d like to talk about how I can turn my rambling into an interesting solution.”

Stay tuned to find out what the non-deterministic answer was!