logo
  • Home
  • Service
  • Blog
  • About
  • Contact
Login
Call Us
+971 54 991 7565

Cookies Consent

This website use cookies to help you have a superior and more relevant browsing experience on the website. Read more...

logo Login
  • +971 54 991 7565
  • contact@digitalinnovation.ae
shape
shape
shape

Blog Details

Home Blog Details
image
  • By Super Admin
  • 15 Mar, 2024
  • AI & Automation

Microsoft Research teaches AI tools how to debug code

Microsoft Research has introduced debug-gym, a novel environment designed to train AI coding tools in the complex art of debugging code. As AI’s role in software development expands, debug-gym aims to address a critical bottleneck: while AI can generate code efficiently, debugging remains a major time sink for developers.

The proliferation of AI coding assistants is enhancing developer productivity. GitHub CEO Thomas Dohmke predicted in 2023 that “sooner than later, 80% of the code is going to be written by Copilot”.

This trend is evident across the industry, with both large corporations and startups increasingly relying on AI for code generation. Y Combinator’s Garry Tan highlighted this, noting that for a quarter of their latest startup batch, 95% of the code was penned by large language models (LLMs).

However, the reality of software development involves far more debugging than initial code writing.

“As maintainers of popular open-source repositories, this resonates with us,” stated the Microsoft Research team. They posed a compelling question: “But what if an AI tool could propose fixes for hundreds of open issues, and all we had to do was approve them before merging?”

Bridging the gap: Interactive debugging for AI

Debugging, as defined by the researchers, is an interactive and iterative process to fix code. Developers typically form hypotheses about crashes, gather evidence by stepping through code execution, examine variable values (often using tools like the Python debugger, pdb), and repeat this cycle until the issue is resolved.

Debugging, as defined by the researchers, is an interactive and iterative process to fix code. Developers typically form hypotheses about crashes, gather evidence by stepping through code execution, examine variable values (often using tools like the Python debugger, pdb,) and repeat this cycle until the issue is resolved.

Debug-gym aims to equip AI agents with similar code debug capabilities. It asks: “to what degree can LLMs use interactive debugging tools such as pdb?”

The environment provides code-repairing AI agents with access to tools for active information-seeking, expanding their action and observation capabilities. Agents within debug-gym can set breakpoints, navigate code, inspect variable values, create test functions, and choose whether to investigate further or rewrite code based on their confidence level.

“We believe interactive debugging with proper tools can empower coding agents to tackle real-world software engineering tasks and is central to LLM-based agent research,” the Microsoft team explained.

Fixes proposed by these enhanced agents – following human approval – would be grounded in the specific codebase context, program execution details, and documentation, moving beyond mere guesswork based on training data.

Debug-gym is built with several key considerations:

  • Repository-level handling: Agents can access and modify files within the entire code repository.
  • Robustness and safety: Code execution occurs within sandboxed Docker containers, isolating the environment to prevent harmful actions while allowing thorough testing.
  • Extensibility: The platform is designed for easy integration of new debugging tools.
  • Text-based interaction: Observations are presented in structured text (like JSON), and actions use a simple text syntax, ensuring compatibility with modern LLMs.

Researchers can use debug-gym with custom repositories and evaluate agent performance using benchmarks like Aider (simple function generation), Mini-nightmare (short, buggy examples), and SWE-bench (real-world problems requiring deep codebase understanding.)

Promising early results

Initial experiments involved a simple prompt-based agent using various LLMs (including Claude 3.7, OpenAI o1, and OpenAI o3-mini) equipped with debug tools like eval, view, pdb, rewrite, and listdir.

While even with these tools, solving complex issues like those in SWE-bench Lite remained challenging (rarely exceeding 50% success rate), the performance uplift compared to agents without debugging tools was significant.

The success rate on SWE-bench Lite saw relative increases of 30% for Claude 3.7, 182% for OpenAI o1, and 160% for OpenAI o3-mini when debugging tools were available.

The researchers attribute the overall difficulty to the lack of sequential decision-making data (like debugging traces) in current LLM training datasets. However, the marked improvement validates the potential of this research direction.

Training AI code debug specialists

The Microsoft Research team believes fine-tuning LLMs specifically for interactive debugging is the next step. This necessitates creating specialised datasets, potentially recording agent interactions within the debugger as they gather information to solve problems.

Unlike standard reasoning tasks, interactive debugging involves a cycle of action, environmental feedback, and subsequent decision-making, requiring rich data capturing the entire problem-solving sequence.   

The plan includes fine-tuning an “info-seeking model” dedicated to gathering necessary bug-fixing information, which would then provide relevant context to a primary code generation model. This could potentially involve smaller, efficient info-seeking models feeding larger generation models, akin to an advanced Retrieval Augmented Generation (RAG) system, potentially saving on AI inference costs.

By open-sourcing debug-gym, Microsoft Research invites the wider community to contribute to advancing interactive debugging agents and, more broadly, AI agents capable of actively seeking information from their environment.

See also: Open-source AI matches coding abilities of proprietary models

Tags: AI AI Tools Artificial Intelligence
Share:
Search
Category
  • AI & Automation (1)
  • App Development (0)
  • Web Development (2)
  • App Development Trends (0)
  • Tech & Tools (0)
  • Case Studies & Success Stories (1)
  • Digital Transformation/ Business Growth (1)
Resent Post
  • image
    09 Oct, 2025
    Why Every Business Needs a Modern Website in 2025
  • image
    18 Sep, 2025
    Why Every Business Needs a Strong Digital Presence in 2025
  • image
    24 Apr, 2025
    Collaborative AI tools are reshaping software development
Tags
Website Development Digital Transformation E-commerce Web Design SEO digital presence business growth websites ERP digital marketing 2025 trends AI Artificial Intelligence Automation Website Web Development Website Trends Web Trends AI Tools AI Trends
shape
shape
shape
shape
shodow
image

Welcome to our digital agency. We hope you will love our website and soon get some awesome services from us.

Our Services

  • Web Development
  • Cyber Security
  • AI Solutions
  • App Development
  • UI/UX Design

Quick Link

  • FAQ
  • Home
  • Blog
  • About
  • Contact

Contact Us

Building No.1, Mezzanine M3 - Dawhat Ar Ridaym St - Hadbat Al Za`Faranah - Abu Dhabi

  • Opening Hours:

    Mon - Fri: 9.00 AM - 6.00 PM

  • Phone Call:

    +971 54 991 7565, +971 2 445 8270

© All Copyright 2025 by Digital Innovation Technology

  • Terms & Conditions
  • Privacy Policy