Buy Crypto Markets Spot FuturesGOLD Earn Event Center

Significant improvement of 39% over leading Opus and Sonnet models MENLO PARK, Calif., Feb. 3, 2026 /PRNewswire/ — Bito, the company building deep context graphsSignificant improvement of 39% over leading Opus and Sonnet models MENLO PARK, Calif., Feb. 3, 2026 /PRNewswire/ — Bito, the company building deep context graphs

Bito’s AI Architect Achieves Highest Success Rate of 60.8% on SWE-Bench Pro

Author: AI Journal

Source: AI Journal

2026/02/03 23:30

3 min read

For feedback or concerns regarding this content, please contact us at [email protected]

Significant improvement of 39% over leading Opus and Sonnet models

MENLO PARK, Calif., Feb. 3, 2026 /PRNewswire/ — Bito, the company building deep context graphs for coding agents, today announced evaluation results for its AI Architect context engine. A Claude Sonnet 4.5 agent augmented with Bito’s AI Architect achieved a 60.8% success rate on SWE-Bench Pro, the leading benchmark for evaluating AI coding agents on long-horizon software engineering tasks.

Bito’s AI Architect shows how context dramatically boosts agent performance.

This significantly outperformed the baseline Claude Sonnet 4.5 (without AI Architect), which scored 43.6%, a 39.4% relative improvement.

Third-party evaluator The Context Lab measured agent performance through success rate and efficiency metrics including speed, tool calls, and token costs. The analysis breaks down results by repository size, task complexity, and task categories. The strongest improvements are reported in UI/UX enhancements (over 200%), performance bugs (over 100%), critical bug fixes (+50%), and security bugs (+37.5%).

“The industry has been focused on AI models alone as the path to improved performance in the software development cycle. But we are starting to see that there are many other levers to pull to dramatically improve performance,” said Amar Goel, CEO of Bito. “A significant, competitive edge comes from the context engine powering those agents. Bito’s AI Architect provides deep structural and semantic understanding of entire codebases, their dependencies, and usage patterns to coding agents such as Claude Code and Cursor via MCP. The benchmark results proved the importance of the context engine.”

The evaluation used identical Claude Sonnet 4.5 agents under two conditions. In the baseline condition, the agent relied on native file search and tool-driven exploration to infer repository structure. In Bito’s AI Architect condition, the agent was augmented with system-level codebase intelligence via the AI Architect MCP.

The evaluation focused on the five largest repositories in the SWE-Bench Pro benchmark by lines of code and file count, including multiple programming languages. This selection emphasizes systems where scale, dependency depth, and architectural complexity materially affect task difficulty. Bito’s AI Architect shows how context dramatically boosts agent performance.

Developer teams in enterprises can use Bito’s AI Architect to gain system-level codebase intelligence by building a dynamic knowledge graph of their vast amount of repositories, modules, APIs, and dependencies utilizing institutional knowledge to the fullest.

Learn the details of the report here.

Visit Bito.ai for more information about AI Architect.

Media Contact:
[email protected]

View original content:https://www.prnewswire.com/news-releases/bitos-ai-architect-achieves-highest-success-rate-of-60-8-on-swe-bench-pro-302676926.html

SOURCE Bito

Get 20 USDT in Just 1 Minute

Deposit $100 to unlock $300 in GOLD positions

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.