Google’s Upgraded Gemini 3 Deep Think Claims to Outperform GPT-5.2 and Claude Opus 4.6

Google’s Upgraded Gemini 3 Deep Think Claims to Outperform GPT-5.2 and Claude Opus 4.6

Google on Thursday introduced a major upgrade to its advanced AI model, Gemini 3 Deep Think. Originally launched in December 2025, the model was already considered the company’s most powerful AI system. However, with the latest update, it now delivers stronger performance across key benchmarks.

According to Google, the upgraded model can provide more effective support to scientists and researchers working on complex real-world problems. The company claims that the new version has surpassed OpenAI’s GPT-5.2 and Anthropic’s Claude Opus 4.6 in several major AI evaluations.

Enhanced for Research and Engineering Challenges

In an official blog post, Google explained that the updated Gemini 3 Deep Think is designed to address modern challenges in science, engineering, and research. The model remains available to Google AI Ultra subscribers. Additionally, select researchers and enterprise users can now access it through the company’s API.

CEO Sundar Pichai stated that Google refined the model in close collaboration with scientists to better understand and solve difficult real-world problems. Meanwhile, Elon Musk described the development as “Impressive.”

Record Performance Across Key Benchmarks

Following the upgrade, Gemini 3 Deep Think achieved an 84.6 percent score on the ARC-AGI-2 benchmark, which measures advanced reasoning ability. Google added that the ARC Prize Foundation verified this result.

Furthermore, the model set a new record on Humanity’s Last Exam, widely regarded as one of the toughest AI tests. Without using external tools, it scored 48.4 percent, outperforming competing models.

In addition, the model reached an Elo score of 3,455 on Codeforces, highlighting its strong coding and algorithmic skills. Across all these tests, Google claims Gemini 3 Deep Think outperformed rival models from OpenAI and Anthropic.

Real-World Applications in Science

Beyond benchmark achievements, Google also shared practical use cases. For instance, Lisa Carbone, a mathematician at Rutgers University, used Gemini 3 Deep Think to review a highly technical mathematics paper. According to the company, the model identified a subtle logical flaw that had previously gone unnoticed during human peer review.

With improved reasoning, coding capability, and research support, Gemini 3 Deep Think further strengthens Google’s position in the competitive AI landscape.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *