Google Research has been poking at a problem that’s bugged educators for years: how do you measure skills that are inherently messy? Critical thinking, collaboration, creative problem-solving — these are the things everyone agrees matter, but they’re a nightmare to grade at scale.
Standardized tests are too rigid. Real human assessment is too expensive and inconsistent. So the team, in partnership with NYU, built something called Vantage — a research experiment now available on Google Labs that uses generative AI to simulate group conversations and score students on these so-called “future-ready” skills. And the early results are surprisingly solid: the AI scoring is on par with human experts.
The setup: AI avatars that push back
Vantage drops students into a simulated team environment. You’re not answering multiple-choice questions. You’re having a conversation with AI avatars — think preparing for a debate or pitching a creative idea. The avatars are steered by an “Executive LLM” that follows a rubric and dynamically introduces challenges. Someone disagrees with your idea. A conflict emerges. The AI watches how you handle it.
This is the part that actually impressed me. The system doesn’t just passively record what you say. It adapts. If you handle a disagreement well, it might escalate. If you’re too passive, it might give you a nudge. It’s a next-generation adaptive assessment engine, not a glorified chatbot.
Why this matters more than you think
We’ve been talking about “21st century skills” for two decades now, but most schools still default to grading what’s easy to measure: memorization, formulaic writing, test performance. The OECD and WEF keep publishing frameworks about critical thinking and collaboration, but turning those frameworks into actual classroom practice has been a slog.
Vantage isn’t the first attempt to automate soft skill assessment — I’ve seen plenty of clunky role-play sims over the years — but it’s the first that feels genuinely scalable. The AI scoring matched human experts in the NYU study, which is higher than I expected for something that’s still a research experiment. There’s no way to fake nuance in a conversation, and the system seems to catch things a human rater might miss, like how you build on someone else’s idea versus just agreeing.
The catch (there’s always one)
Vantage is still a sandbox. It’s available for sign-up in English, but it’s clearly early days. The scenarios feel curated, and I wonder how well the system generalizes across different cultural contexts or communication styles. Also, let’s be honest: students might game this once they figure out what the AI is looking for. That’s a problem every adaptive system faces.
Still, the approach is smarter than most. Instead of trying to measure skills in a vacuum, Vantage creates a controlled but dynamic environment where those skills actually manifest. That’s a meaningful step forward.
Bottom line
If you’re an educator or someone building assessment tools, Vantage is worth watching. It’s not a finished product, but it shows that GenAI can do more than write essays or generate images — it can evaluate how we think and interact. That’s a future-ready skill in itself.
Comments (0)
Login Log in to comment.
Be the first to comment!