Goodfire, a San Francisco startup, just dropped a tool called Silico that does something most people would have called science fiction a few years ago: it lets you pop the hood on a large language model, see what individual neurons are doing, and turn dials while it’s still training.
This is a big deal. Right now, training an LLM is mostly throwing compute and data at the problem and hoping for the best. Models like ChatGPT and Gemini can do incredible things, but nobody really knows why. Goodfire’s CEO Eric Ho puts it bluntly: “The dominant feeling in every major frontier lab today is that you just need more scale, more compute, more data, and then you get AGI and nothing else matters. And we’re saying no, there’s a better way.”
Silico is built on mechanistic interpretability, a technique that maps out the neurons and pathways inside a model to understand what’s actually happening when it generates text. MIT Technology Review just named this one of the 10 Breakthrough Technologies of 2026, and Goodfire is one of a handful of companies—alongside Anthropic, OpenAI, and Google DeepMind—pushing it forward.
What sets Silico apart is that it’s not just for auditing models that are already finished. Goodfire wants to use it during the design and training process itself. “We want to remove the trial and error and turn training models into precision engineering,” says Ho. “That means exposing the knobs and dials so that you can actually use them during the training process.”
The tool uses AI agents to automate a lot of the heavy lifting. Ho says agents are now strong enough to do interpretability work that previously required human researchers. That’s the breakthrough that makes this a viable product rather than just a lab experiment.
So how does it work? You zoom in on specific neurons or groups of neurons—assuming you have access to the model’s internals, which means mostly open-source models for now. Check what inputs make them fire. Trace the pathways upstream and downstream. Then you can adjust the parameters connected to those neurons to boost or suppress certain behaviors.
Goodfire has some compelling examples. They found a neuron inside Qwen 3 that was associated with the trolley problem. Activating it made the model frame everything as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” says Ho.
Another example is more practical. They asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing negative business impact. By boosting neurons associated with transparency and disclosure, they flipped the answer from no to yes nine out of ten times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” Ho explains.
Silico can also help during training by filtering out specific data that would set unwanted values. For instance, they traced the infamous “9.11 vs 9.9” bug—where models say 9.11 is greater—to neurons associated with biblical verses and code version numbering. You can catch that stuff before it becomes a problem.
But let’s not get carried away. Leonard Bereska, a researcher at the University of Amsterdam who works on mechanistic interpretability, thinks Silico looks useful but pushes back on the grand claims. “In reality, they are adding precision to the alchemy,” he says. “Calling it engineering makes it sound more principled than it is.”
He’s not wrong. We’re still in the early days of understanding how these models work. Goodfire’s tool is a step forward, but it’s not magic. And the tool itself requires access to model internals, which most people don’t have for the big proprietary models.
Still, I like what Goodfire is doing. The AI industry has been moving too fast with too little understanding of what’s actually happening inside these systems. Any tool that helps developers see what they’re building and fix problems before they go live is a welcome addition. Is it engineering yet? Not quite. But it’s a hell of a lot better than blind faith in scale.
Comments (0)
Login Log in to comment.
Be the first to comment!