AutoCoder: The First LLM to Surpass GPT4 for AI Assisted Coding

June 2, 2024

This content was generated using AI and curated by humans

Code generation is a rapidly evolving field aimed at enhancing software development by creating tools that can automatically generate, interpret, and debug code. These tools are crucial for modern software development as they improve efficiency and reduce programming errors. However, a significant challenge in this field is the creation of high-quality, large-scale datasets for training language models. Traditional methods of dataset creation are costly and time-consuming, often relying on manual annotation or expensive closed-source models, which limits the accessibility and scalability of developing powerful code generation tools.

Current Methods for Creating Code Instruction Datasets

Current methods for creating code instruction datasets include SELF-INSTRUCT, EVOL-INSTRUCT, and OSS-INSTRUCT. These methods use strong teacher models to generate synthetic coding instructions or derive problems from open-source code snippets. However, these approaches are limited by their dependency on the teacher models, which can transfer both correct and incorrect knowledge to student models. This dependency caps the performance of student models by the quality and accuracy of the teacher models, making it challenging to achieve significant breakthroughs in code generation capabilities.

Introducing AIEV-INSTRUCT: A Novel Method

Researchers from the University of Connecticut and AIGCode have introduced a novel method called AIEV-INSTRUCT. This method creates a high-quality code dataset through an interactive process involving two agents—a questioner and a programmer—that simulate coding and testing dialogues. The method transitions from proprietary models to self-learning stages, reducing reliance on costly closed-source models. This innovative approach addresses the limitations of existing methods and enhances the robustness and accuracy of the generated datasets.

Stages of AIEV-INSTRUCT

Teaching Stage

AIEV-INSTRUCT operates in two stages: the Teaching Stage and the Self-learning Stage. Initially, it uses a proprietary model to generate and validate code instructions. In the Teaching Stage, GPT-4 Turbo serves as the teacher model, guiding the generation of high-quality code snippets and ensuring their correctness through unit tests. The process involves multiple rounds of interaction between the questioner and programmer agents, with execution feedback used to refine the generated code continuously.

Self-learning Stage

Once the student model surpasses the teacher model in accuracy, it transitions to a self-learning stage where the student model autonomously generates and validates code. In the Self-learning Stage, the student model itself acts as both the questioner and programmer, iteratively improving its performance through self-generated dialogues and execution feedback. This process ensures the generated code’s accuracy and reduces dependency on expensive closed-source models.

Performance of AutoCoder

The performance of the proposed model, AutoCoder, trained with AIEV-INSTRUCT, is remarkable. AutoCoder achieved a pass rate of 90.9% on the HumanEval benchmark, surpassing top models like GPT-4 Turbo, which scored 90.2%. Moreover, AutoCoder demonstrated superior capabilities in code interpretation, allowing for the installation of external packages, unlike its predecessors, which were limited to built-in packages. This capability significantly enhances AutoCoder’s versatility and applicability in real-world coding scenarios.

Benchmark Testing

Furthermore, AutoCoder was tested on several datasets, including HumanEval+, MBPP, MBPP+, MultiPL-E, and DS-1000. It ranked first among all language models on the HumanEval Base Test and achieved top-five rankings on the other benchmarks. Specifically, AutoCoder-S, a smaller variant with 6.7 billion parameters, showed impressive results with pass rates of 78.7% on HumanEval and 79.4% on MBPP, highlighting its efficiency and accuracy even with fewer parameters.

Conclusion

In conclusion, the research introduces a significant advancement in code generation by proposing a cost-effective and accurate method for creating code instruction datasets. AutoCoder, utilizing the AIEV-INSTRUCT method, exhibits exceptional performance, surpassing existing models in key benchmarks. This innovation enhances the efficiency of code generation tasks and provides a scalable approach to improving language models in coding applications. The University of Connecticut and AIGCode contributions demonstrate the potential for substantial improvements in software development processes, making high-quality code generation tools more accessible and effective for developers worldwide.

This blog post is AI generated with input from the following sources:

AutoCoder: The First Large Language Model to Surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval Benchmark Test (90.9% vs. 90.2%)
Authors: Nikhil from MARKTECHPOST
Publish Date: 2024-05-31

Discover More AI Insights

This Week in AI

Robots, Autonomous Coding, and Multimodal Models

From ChatGPT-powered robots to autonomous coding assistants and multimodal AI breakthroughs, this week saw major advances in artificial intelligence technology.

March 23, 2024

minutes

AI Game Changers

Anthropic Claude's Groundbreaking AI Projects Feature

Anthropic's new features for Claude.ai Pro and Team users include Projects, powered by Claude 3.5 Sonnet, enhancing AI-driven workflows and collaboration.

June 28, 2024

minutes

AI Trends

Viture Pro Smart Glasses: Stunning Display Enhances AI and AR Experiences

Explore the Viture Pro SmartGlasses, designed to enhance productivity and connectivity with a stunning 4,000-nit micro-OLED display. While these smart glasses excel in display quality and comfort, their AI integration remains basic, relying heavily on smartphone connectivity. Discover potential improvements, limitations, and alternative AR glasses with advanced AI capabilities, such as Ray-Ban Meta, Lucyd, and Vuzix Blade that may better suit your needs.

August 15, 2024

minutes

Blogs