Is OpenAI GPT-5.2 actually better than Google Gemini 3 Pro? If you strip away the extra "thinking" time used in the benchmarks, the gap disappears. We dug into Is OpenAI GPT-5.2 actually better than Google Gemini 3 Pro? If you strip away the extra "thinking" time used in the benchmarks, the gap disappears. We dug into

OpenAI GPT-5.2: The “Cheating” Controversy

Recently OpenAI released GPT-5.2 which has superior benchmark results. However, some online chatters reveal that OpenAI might have used more tokens and compute for the benchmark test, and might be considered “cheating” the tests. If everything is equal, is GPT-5.2 actually on par with Gemini 3 Pro? Here we try to find out.

The "Cheating" Controversy: Compute & Tokens

The core of the controversy lies in inference-time compute. "Cheating" in this context refers to OpenAI using a configuration for benchmarks that is significantly more powerful (and expensive) than what is available to standard users or what is typical for a "fair" comparison.

\

  • "xhigh" vs. "Medium" Effort: Reports indicate that OpenAI's published benchmark results were generated using an "xhigh" reasoning effort setting. This mode allows the model to generate a massive number of internal "thought" tokens (reasoning steps) before producing an answer.
  • The Issue: Standard ChatGPT Plus users reportedly only have access to "medium" or "high" effort modes. The "xhigh" mode used for benchmarks consumes vastly more tokens and compute, effectively brute-forcing higher scores by allowing the model to "think" for much longer (sometimes 30-50 minutes for complex tasks) than a standard interaction allows.
  • Inference Scaling: This leverages a concept where allowing a model to generate more tokens during inference (test time) improves performance significantly. Critics argue that comparing GPT-5.2's "xhigh" scores against Gemini 3 Pro's standard outputs is misleading because it compares a "maximum compute" scenario against a "standard usage" scenario.

Benchmark Comparison (GPT-5.2 vs. Gemini 3 Pro)

When the massive compute boost is factored in, GPT-5.2 does post higher scores, but the gap narrows or reverses when conditions are scrutinized.

\

| Benchmark | GPT-5.2 (Thinking/Pro) | Gemini 3 Pro | Context | |----|----|----|----| | ARC-AGI-2 | 52.9% | ~31.1% | Measures abstract reasoning. GPT-5.2's score is heavily reliant on the "Thinking" process. | | GPQA Diamond | 92.4% | 91.9% | Graduate-level science. The scores are effectively tied (within margin of error). | | SWE-Bench Pro | 55.6% | N/A | Real-world software engineering. GPT-5.2 sets a new SOTA here. | | SWE-Bench Verified | 80.0% | 76.2% | A more established coding benchmark. The models are roughly comparable here. |

\n

  • Private Benchmarks: Some independent evaluations (e.g., restricted "private benchmarks" mentioned in discussions) suggest that Gemini 3 Pro actually outperforms GPT-5.2 in areas like creative writing, philosophy, and tool use when the "gaming" of public benchmarks is removed.

Are They "On Par"?

Yes, and Gemini 3 Pro may even be superior in "base" capability.

\ If "everything is equal"—meaning both models are restricted to the same amount of inference compute (thinking time)—the general consensus implies they are highly comparable, with different strengths:

\

  • Gemini 3 Pro Advantages:
  • Base Intelligence: Appears to have stronger fundamental capability in long-context understanding (massive context window), theoretical reasoning, and creative tasks without needing excessive "thinking" time.
  • Cost Efficiency: For many tasks, it achieves similar results with less compute (and thus lower cost/latency).
  • GPT-5.2 Advantages:
  • Agentic Workflow: With the "Thinking" mode enabled (high compute), it excels at complex, multi-step agents and coding tasks (SWE-Bench). It is "tuned" effectively to use extra compute to solve harder problems.

\

Conclusions

The claim that they are "on par" is accurate. If you strip away OpenAI's "xhigh" compute advantage used in benchmarks, Gemini 3 Pro is likely equal or slightly ahead in raw model intelligence. GPT-5.2's "superiority" in benchmarks largely comes from its ability to spend significantly more time and compute processing a single prompt.

\ Based on the verification performed, here is the compiled list of sources regarding the GPT-5.2 release, the Gemini 3 Pro comparison, and the associated benchmarking controversy.

References

1. Official Release Announcements

OpenAI – System Card Update

  • openai.com/index/gpt-5-system-card-update-gpt-5-2/

    \n Google – The Gemini 3 Era

  • blog.google/products/gemini/gemini-3/

2. Benchmark Performance & Technical Analysis

R&D World – Comparative Analysis

\

  • Title: "How GPT-5.2 stacks up against Gemini 3.0 and Claude Opus 4.5"
  • Verified Details: Validates the 52.9% score on ARC-AGI-2 (Thinking mode) vs. Gemini 3 Pro's ~31.1%. Confirms GPT-5.2's lead in abstract reasoning is heavily tied to the "Thinking" process.
  • Source: rdworldonline.com/how-gpt-5-2-stacks-up \n

Vellum AI – Deep Dive

\

  • Title: "GPT-5.2 Benchmarks"
  • Verified Details: Verifies the 92.4% score on GPQA Diamond, noting it is effectively tied with Gemini 3 Pro (91.9%) when within the margin of error, but marketed as a "win" by OpenAI.
  • Source: vellum.ai/blog/gpt-5-2-benchmarks

\ Simon Willison’s Weblog

\

  • Title: "GPT-5.2"
  • Verified Details: Technical breakdown of the API pricing ($1.75/1M input) and the distinction between the "Instant" and "Thinking" API endpoints.
  • Source: simonwillison.net/2025/Dec/11/gpt-52/

3. The "Cheating" & Compute Controversy

Reddit (r/LocalLLaMA & r/Singularity)

\

  • Threads: "GPT-5.2 Thinking evals" & "OpenAI drops GPT-5.2 'Code Red' vibes"
  • Verified Details: These community discussions are the primary source of the "cheating" allegations. Users identified that OpenAI's benchmarks used "xhigh" (extra high) reasoning effort—a setting that uses significantly more tokens and time than the "Medium" or "High" settings available to standard users or used in Gemini's standard benchmarks.
  • Source: reddit.com/r/singularity/comments/1pk4t5z/gpt52thinkingevals/
  • Source: reddit.com/r/ChatGPTCoding/comments/1pkq4mc/

\ InfoQ News

\

  • Title: "OpenAI's New GPT-5.1 Models are Faster and More Conversational" (Contextual coverage including 5.2)
  • Verified Details: Discusses the introduction of the "xhigh" reasoning effort level and the trade-offs between benchmark scores and actual user latency/cost.
  • Source: infoq.com/news/2025/12/openai-gpt-51/

\

Piyasa Fırsatı
Propy Logosu
Propy Fiyatı(PRO)
$0.3519
$0.3519$0.3519
-1.97%
USD
Propy (PRO) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen [email protected] ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Is Putnam Global Technology A (PGTAX) a strong mutual fund pick right now?

Is Putnam Global Technology A (PGTAX) a strong mutual fund pick right now?

The post Is Putnam Global Technology A (PGTAX) a strong mutual fund pick right now? appeared on BitcoinEthereumNews.com. On the lookout for a Sector – Tech fund? Starting with Putnam Global Technology A (PGTAX – Free Report) should not be a possibility at this time. PGTAX possesses a Zacks Mutual Fund Rank of 4 (Sell), which is based on various forecasting factors like size, cost, and past performance. Objective We note that PGTAX is a Sector – Tech option, and this area is loaded with many options. Found in a wide number of industries such as semiconductors, software, internet, and networking, tech companies are everywhere. Thus, Sector – Tech mutual funds that invest in technology let investors own a stake in a notoriously volatile sector, but with a much more diversified approach. History of fund/manager Putnam Funds is based in Canton, MA, and is the manager of PGTAX. The Putnam Global Technology A made its debut in January of 2009 and PGTAX has managed to accumulate roughly $650.01 million in assets, as of the most recently available information. The fund is currently managed by Di Yao who has been in charge of the fund since December of 2012. Performance Obviously, what investors are looking for in these funds is strong performance relative to their peers. PGTAX has a 5-year annualized total return of 14.46%, and is in the middle third among its category peers. But if you are looking for a shorter time frame, it is also worth looking at its 3-year annualized total return of 27.02%, which places it in the middle third during this time-frame. It is important to note that the product’s returns may not reflect all its expenses. Any fees not reflected would lower the returns. Total returns do not reflect the fund’s [%] sale charge. If sales charges were included, total returns would have been lower. When looking at a fund’s performance, it…
Paylaş
BitcoinEthereumNews2025/09/18 04:05
‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out?

‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out?

The post ‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out? appeared on BitcoinEthereumNews.com. LOVE ISLAND GAMES — Episode 201 — Pictured: Ariana Madix — (Photo by: Ben Symons/PEACOCK via Getty Images) Ben Symons/PEACOCK via Getty Images We’ve got a text! It’s time for another season of Love Island Games. With fan-favorites returning in hopes of winning the $250,000 cash prize, read on to learn more about Love Island Games Season 2, including the release schedule so you don’t miss a second of drama. Love Island Games is a spinoff in the Love Island franchise that first premiered in 2023. The show follows a similar format to the original series, but with one major twist: all contestants are returning Islanders from previous seasons of Love Island from around the world, including the USA, UK, Australia and more. Another big difference is that games take on much more importance in Love Island Games than the mothership version, with the results “determining advantages, risks, and even who stays and who goes,” according to Peacock. Vanderpump Rules star Ariana Madix is taking over hosting duties for Love Island Games Season 2, replacing Love Island UK star Maya Jama who hosted the first season. Iain Stirling returns as the show’s narrator, while UK alum Maura Higgins will continue to host the Saturday show Love Island: Aftersun. ForbesWho’s In The ‘Love Island Games’ Season 2 Cast? Meet The IslandersBy Monica Mercuri Jack Fowler and Justine Ndiba were named the first-ever winners of Love Island Games in 2023. Justine had previously won Love Island USA Season 2 with Caleb Corprew, while Jack was a contestant on Love Island UK Season 4. In March 2024, Fowler announced on his Instagram story that he and Justine decided to remain “just friends.” The Season 2 premiere revealed the first couples of the season: Andrea Carmona and Charlie Georgios, Andreina Santos-Marte and Tyrique Hyde,…
Paylaş
BitcoinEthereumNews2025/09/18 04:50
Tesla, Inc. (TSLA) Stock: Rises as Battery Cell Investment Expands at German Gigafactory

Tesla, Inc. (TSLA) Stock: Rises as Battery Cell Investment Expands at German Gigafactory

  TLDR TSLA trades near $485 after news of higher battery investment in Germany • Tesla targets up to 8 GWh of annual battery cell output by 2027 • Total cell factory
Paylaş
Coincentral2025/12/17 04:37