Finally, the Gemini 3.0 performance of Google DeepMind has been unveiled. There is no other way to explain it other than to say it is literally overwhelming.
Above all, the non-linear leap leading to Gemini 2.5 Pro → 3.0 Pro has dashed skepticism that the era of pre-training is over recently.
Now, the numbers clearly show that the claim that the limitations of the foundation model were already seen was premature.
The benchmarks released by Google DeepMind are shockingly clear.
- MathArena Apex: 0.5% → 23.4% (approximately 46 times better)
- ARC-AGI-2 (inference): 4.9% → 31.1%
- ScreenSpot-Pro: 11.4% → 72.7%
This isn’t just a simple improvement. It is a result of a straightforward reversal of the conventional wisdom that the performance will no longer increase even if the model weight is raised and data is put in, and it is a moment that has demonstrated once again how great the potential to be raised in the pre-learning stage is.
In other words, scaling is still valid, there is infinite room for algorithm improvement, and pre-training is not over yet.
With the announcement of Gemini 3.0, we may be at the beginning of another explosive J-curve.
출처: https://blog.google/products/gemini/gemini-3/
