Google’s been pushing its Gemini models hard, and they’ve gotten genuinely better over the last year. But Gemini is still a walled garden — you use it on Google’s terms or not at all. The Gemma open-weight line was supposed to give developers more breathing room, but Gemma 3 launched over a year ago, and in AI time that’s practically ancient.
Starting today, Gemma 4 is available, and it comes in four sizes. Google finally listened to the licensing complaints too — they’re switching to Apache 2.0. No more of that custom Gemma license nonsense that made lawyers twitch.
The big news is local usability. Google designed these models to run on actual hardware you can own, not just cloud instances. The two large variants are a 26B Mixture of Experts and a 31B Dense model. Both are meant to run unquantized in bfloat16 on a single 80GB Nvidia H100 GPU. Yes, that’s a $20,000 accelerator, but it’s still local. Quantize them down to lower precision and they’ll fit on consumer GPUs without too much pain.
The 26B MoE model is the interesting one for speed. It only activates 3.8 billion of its 26 billion parameters during inference. That’s a 6.8x efficiency ratio, which means significantly higher tokens-per-second than similarly sized dense models. Google claims they focused on reducing latency to make local processing actually feel responsive.
The 31B Dense variant is the opposite approach — all parameters all the time, trading speed for quality. Google expects developers to fine-tune it for specific use cases rather than running it raw.
I’ve seen this MoE approach tried before with mixed results. Sometimes the routing between experts becomes a bottleneck itself, or the quality gap between MoE and dense models is too wide for production work. But if Google’s latency claims hold up, this could be genuinely useful for on-device applications where you can’t afford cloud round-trips.
The Apache 2.0 license change is probably the most pragmatic decision here. The custom Gemma license was confusing and created unnecessary friction. Developers want to experiment, deploy, and potentially commercialize without hiring a lawyer to read fine print. Google finally got that message.
No word yet on community benchmarks or how Gemma 4 stacks up against Llama 4 or Mistral’s latest. I’ll be watching the independent evaluations closely. Google’s track record with open models has been solid but not market-leading. The parameter efficiency of that 26B MoE could change that narrative if the quality holds up.
Comments (0)
Login Log in to comment.
Be the first to comment!