When it can’t explain what it sees, it asks

Qualia is a small program that watches a webcam and tries not to be surprised. When it can’t explain what it’s seeing — when its prediction stays wrong no matter how it adjusts its beliefs — it does the thing most perception code never does: it asks a question, takes the answer, and remembers it. Next time, it isn’t surprised.

That behavior falls out of an old idea from neuroscience. Friston’s free energy principle — predictive coding before it — holds that a brain isn’t a camera passively receiving the world; it’s a prediction machine. Every layer guesses what the layer below is about to report, and all that flows upward is the error: the part of the signal the guess didn’t already cover. Perception, on this account, is just the work of making that error small. I wanted to feel what that’s like to build, so I wrote one — a hierarchy of belief layers on a laptop GPU, eating a webcam feed, trying to be unsurprised.

What a belief is, concretely
The part that makes it active
The unglamorous half: keeping four languages agreeing
Why I built it

What a belief is, concretely

In the engine, a belief isn’t a vector — it’s a little distribution. Each layer holds a BeliefSlot: a 64-dimensional mean (its best guess at the current state), a 64-dimensional precision (how confident it is, per dimension — the inverse of variance), a prediction for what comes next, and the residual — the prediction error left over after the fact arrives. There’s a vfe field too: variational free energy, the quantity the whole system is trying to push down.

That precision term is the part people skip and it’s the whole game — and the engine uses it in two opposite directions at once. A prediction error isn’t worth the same everywhere. Precision sets the alarm: an error in a dimension you’re confident about counts for more in the free energy the layer is trying to minimize, so it registers as real surprise rather than noise. But precision also sets the inertia: a belief you hold with high confidence resists being moved by any single error — it’s the dimensions you’re unsure about that yield fastest. So a confident belief shouts when it’s violated and still updates cautiously. The kernel computes both, every slot, every cycle, on the GPU.

Seven of these layers stack up. L6 is the raw sensory surface — the camera feeds straight into it. A semantic embedding model (I used Gemini) supplies the high-level “what should I expect to be looking at,” and it’s injected most strongly into L5, the deepest belief layer, fading as it blends down through L4, L3, L2. The lower layers learn to anticipate raw regularities; L5 is where the scene acquires meaning.

The part that makes it active

Pure predictive coding is passive: minimize error by updating beliefs. Active inference adds the other lever. If you can’t explain away your surprise by changing your mind, you change the world — or go get the information that would make the surprise go away. Perception drives free energy down by fitting your beliefs to the world; action drives it down by fitting the world to your beliefs. (Which action to take is the part scored by expected free energy — the surprise you anticipate down each path you could choose.)

So in Qualia, when a layer’s residual stays stubbornly high — when it keeps being wrong in a way belief updates can’t fix — it doesn’t just keep absorbing the error. It writes a QuestionSlot: a question aimed at the outside world. The answer comes back and is stored as a LoreEntry — accumulated world-knowledge, tagged by the layer that needed it — and that lore feeds back into future predictions: the answer’s embedding nudges the asking layer’s baseline expectation, so the next time that pattern appears, the layer already predicts it instead of being surprised. The engine gets curious about exactly the things a good question could resolve — not noise it could never predict, but the ambiguities an answer would actually settle — and remembers what comes back so it won’t be surprised the same way twice. A machine that asks questions only about what confuses it is a surprisingly tidy consequence of “minimize free energy.”

When a layer can’t explain its surprise by updating, it acts: it asks a question, stores the answer as Lore, and won’t be surprised the same way twice. State what you expect; chase the residual when reality disagrees.

The unglamorous half: keeping four languages agreeing

None of that runs if the bytes don’t line up. The belief state lives in shared memory so a fleet of small processes — sensor capture, the belief kernels, the agent, a TUI supervisor — can all read and write it without copying. That means the exact memory layout of a BeliefSlot has to be identical in four places at once: the Rust struct (repr(C, align(64))), the Metal kernel, the CUDA kernel, and the Python bridge. One field added in the wrong spot and every process downstream reads garbage.

My favorite line in the whole project is a test:

assert_eq!(size, 1088, "BeliefSlot size changed — update CUDA kernel struct and Python bridge");

It’s a tripwire. The moment someone changes the belief layout, the build fails with a message telling them every other place they now have to fix. The same kernel ships in two dialects — Metal for Apple Silicon, CUDA for a Jetson — so the engine runs on a laptop and on a robot from one design.

Why I built it

I spend most of my time on the opposite kind of system: contracts and protocols where the whole job is to make behavior exactly predictable and then prove it. This was the inverse — a thing whose entire purpose is to be surprised well, to treat the gap between expectation and observation not as a bug to eliminate but as the signal to learn from. But the instinct underneath is the same one I bring to an audit: state precisely what you expect, watch the residual when reality disagrees, and chase the surprise instead of looking away from it. Here I just wired that loop directly into 64 floats and a GPU.

Code: the qualia/ engine in sensorforge — a robotics monorepo with iPhone/ARKit sensor capture, a Jetson voice assistant, and this active-inference core.

When it can't explain what it sees, it asks

What a belief is, concretely

The part that makes it active

The unglamorous half: keeping four languages agreeing

Why I built it