OpenAI's GPT-5.3 Codex: This is the new AI that programs software almost on its own

  • GPT-5.3-Codex is OpenAI's most advanced programming model, 25% faster and more efficient than its predecessors.
  • The system acts as a development agent: it understands complete repositories, creates apps and web games, and covers the entire software lifecycle.
  • OpenAI used early versions of the model to debug its own training, making GPT-5.3-Codex an AI that helps create itself.
  • It is now available in ChatGPT's paid plans and on the Codex platform (desktop app, web, terminal and IDE extensions), also focusing on security and cyber defense.

OpenAI GPT-5.3-Codex Model

The new generation of OpenAI programming models already has a name: GPT-5.3-CodexIt is a system designed to go far beyond the typical assistant that completes lines of code, and aspires to become a development agent capable of working long hours on complex projects, from the first commit to deployment.

In this context, the San Francisco-based company presents its model as a significant leap forward compared to GPT-5.2 and GPT-5.2-CodexCombining the programming capabilities of the Codex line with the advanced reasoning of the GPT-5.2 general model. The stated objective is clear: that both engineering teams and non-technical professionals can delegate to AI a good part of the routine work that currently absorbs most of their time.

What is GPT-5.3-Codex and why does it represent a stage change?

GPT-5.3-Codex It is OpenAI's latest specialized programming model, designed as an agent that not only generates code snippets, but also undertakes lengthy tasks involving research, use of tools, and complex executionThe company itself describes it as an AI capable of doing almost everything a professional can do in front of a computer, from writing code to documenting and monitoring services.

Compared to previous generations, the key is that this model understands complete repositoriesIt can analyze the folder structure, dependencies, automated tests, and documentation before making any changes. This reduces the risk of breaking changes and makes it easier for the agent to respect the style, internal conventions, and design decisions already present in the code.

According to data provided by OpenAI, GPT-5.3-Codex offers a 25% higher performance It surpasses its predecessors in programming tasks, while consuming fewer tokens to produce equivalent results. This combination of precision and efficiency allows it to work longer on the same project, maintain context, and chain decisions together without losing track.

Beyond code writing, the model is designed to support the entire software lifecycle: debugging, implementation, monitoring, testing, metrics analysis and documentation. In practice, it's closer to the role of a freelance developer who, of course, must continue working under human supervision, as the company itself insists.

One of the most striking aspects is the way it communicates with the user. Instead of simply returning a final response, GPT-5.3-Codex He explains what he's doing and sends status updates. and accepts corrections or changes of course while working. This continuous interaction reduces the feeling of a "black box" and makes it easier for teams to maintain control over important decisions.

Codex interface with GPT-5.3

An agent that understands repositories and creates complex games, websites, and applications

One of the major changes compared to previous Codex models is the ability to understand complete code repositoriesBefore touching a single line, GPT-5.3-Codex scans the file tree, identifies key modules, reviews tests, and reads available documentation. With this comprehensive view, it can develop coherent work plans, avoid regressions, and propose extensive refactorings without undoing previous team decisions.

This high-level understanding translates into concrete tasks such as refactor large blocks of codeIt can detect complex bugs that appear only in certain scenarios, or adapt a codebase to new requirements without having to rewrite everything from scratch. The model is also capable of reviewing its own work, running tests, and correcting errors it detects in its own proposals.

OpenAI has shown practical examples that illustrate this capability, such as the video game creation and web applications from scratch in a matter of days. In one of the internal demonstrations, the model was asked to develop a diving game and an improved version of an existing racing game. With fairly generic instructions, such as "fix the bug" or "improve the game," GPT-5.3-Codex iterated over millions of code tokens until both projects were fully functional.

In another case, the company compared the creation of two similar web pages using GPT-5.2-Codex and the new version. While the previous model resolved the request in a more basic way, GPT-5.3-Codex he added on his own initiative Elements such as an annual pricing plan with discounts or a testimonial carousel, demonstrating a better understanding of what is typically expected in a professional web product.

In addition to writing backend and frontend code, the model also handles tasks that revolve around pure development, such as Generate technical documentation, draft PRDs, prepare presentations in formats like PowerPoint or PDF, or creating spreadsheets with metrics and reports. The underlying idea is that the agent not only programs, but also participates in the broader digital work surrounding a software project.

Software development with GPT-5.3-Codex

Benchmark results: fewer tokens and greater accuracy

To back up its claims, OpenAI has published results in several reference benchmarks used in the industry to measure the capability of programming models. In SWE-Bench ProA battery of tests that groups real-world incidents from open-source projects in various languages, GPT-5.3-Codex achieves figures that the firm describes as record levels for its catalog of models.

En Terminal-Bench 2.0, a set of tasks focused on console work—installing dependencies, managing files, running scripts, and routine system operations—, the model obtains around a 77% correctThis score far surpasses GPT-5.2-Codex and outperforms direct competitors in these types of tests. This advantage suggests that Codex 5.3 is particularly well-suited to terminal-based workflows.

In benchmarks more geared towards the use of a full desktop environment, such as OSWorldThe model maintains solid performance, although the overall picture shows a distribution of strengths among different providers. In any case, the general trend is clear: greater accuracy with fewer tokensThis translates into lower costs and a more agile experience when working with long tasks.

OpenAI also cites notable results in GDPValThis internal assessment focuses on well-defined knowledge tasks spanning dozens of different occupations. In these tests, GPT-5.3-Codex acts as a professional capable of combining programming with writing, data analysis, and digital office tasks.

It's worth remembering, however, that most of these figures come from the company itself and should be interpreted with some caution. Although benchmarks help compare models, The real differences depend a lot on the type of project, the language, the quality of the repository and the clarity of the instructions given to the agent.

GPT-5.3-Codex Programming Agent

An AI that helps itself develop

Beyond the numbers, one of the most striking aspects of the announcement is that GPT-5.3-Codex has been instrumental in its own developmentOpenAI explains that it used early versions of the model to debug the training process, analyze results, and propose improvements to the architecture and the data used.

This does not mean that the AI ​​programmed itself without human intervention, but it does mark a change in approach: the system itself has been used to Identify failure patterns, suggest adjustments, and review some of the engineering workIn a way, the tool has served as support for building its next iteration, shortening experimentation cycles and reducing manual effort in some phases.

This type of self-support does, however, present additional challenges. When a model participates in their own evaluation, it is essential to have external controls, independent verifications and strict security criteria to avoid biases and errors that might go unnoticed. OpenAI claims to have maintained constant human oversight throughout the process, with teams dedicated to reviewing both the model's behavior and the quality of the data.

The company frames this strategy within a broader line of work, in which the models become internal tools for their own creators. From code review to the automation of certain deployment tasks, GPT-5.3-Codex has been used as an additional team member within the organization.

This approach aligns with the idea that the company has long advocated: AI as a tool that amplifies human workThis also applies within the laboratories that develop it. In practice, this translates into faster testing cycles, but also greater responsibility in establishing clear limits and protocols.

Security and cyber defense: potential and limitations

Another dimension in which GPT-5.3-Codex excels is the ciberseguridadOpenAI states that this model is the first in its catalog classified as "high-capacity" for tasks related to detecting software vulnerabilities, according to its own internal readiness framework.

The system has been specifically trained to Identifying flaws in codebases and suggest patches, a capability that could be particularly interesting for open-source projects and European companies that must comply with stringent security and data protection regulations. However, the firm acknowledges that these same capabilities could be misused if adequate safeguards are not in place.

In this regard, OpenAI maintains that it has found no evidence that GPT-5.3-Codex can autonomously execute all the steps of a cyberattack from beginning to end. Even so, the company has opted for a cautious approach, implementing specific mitigation measures to reduce the likelihood of malicious uses, especially in relation to the automation of exploits.

These measures include dual-use security training, automated monitoring systems to detect risky behavior, and restrictions on certain advanced capabilities, particularly in channels where control is weaker. The phased rollout of the model, with more limited API access, is part of this strategy.

At the same time, the company has indicated its intention to collaborate with maintainers of popular projects to offer free security analysis of their repositories, using GPT-5.3-Codex as a tool to detect vulnerabilities that have not yet been revealed.

Where and how can GPT-5.3-Codex be used today

Regarding availability, GPT-5.3-Codex can already be used in all ChatGPT paid subscriptions in countries where Codex is enabled, including the European environment. Unlike other models, its integration focuses on the specific scheduling agent platform that the company has been building in recent months.

The model is present in the Codex desktop application for macOSIt functions as a hub for managing multiple development agents simultaneously. It can also be used via the web version, the terminal interface, and extensions for integrated development environments (IDEs), such as those used daily by many programmers in Spain and the rest of Europe.

In this initial phase, OpenAI maintains a more conservative policy with the exposure via APIAlthough some documents mention integration with third-party tools and CI/CD workflows, the company insists that it is phasing in access to ensure the model's use is as secure as possible. Full API access is planned as the next step, but without a firm timeline.

The system design is clearly geared towards functioning as operational agent and not just a chatbotIn practice, this means that you can call commands, interact with the file system, run tests, and analyze results, always within the limits defined by the developers who integrate it into their workflows.

For European teams, accustomed to navigating strict regulatory compliance requirements (from GDPR to future AI-specific frameworks), these capabilities will need to be carefully evaluated. The promise of increased productivity is clear, but so is the need for establish clear policies on what data is exposed to the model and how their interventions in the code are audited.

Impact on the daily work of developers

The arrival of GPT-5.3-Codex fits into a trend that many teams have already begun to notice in their daily work: the transition from “code auto-completion” to “agent that handles entire tasks”Instead of requesting an isolated function, the idea is that the developer can delegate entire blocks of work, monitoring the result and correcting the direction when necessary.

In practice, this translates into scenarios quite familiar to programmers: building an application from scratch, connecting databases, preparing tests, locating an elusive bug that only appears in production, or reviewing permissions and dependencies in a large system. GPT-5.3-Codex can handle many of these steps, while the team focuses on deciding which product to build and what the business priorities are.

For less technically skilled individuals, the model opens the door to more active participation in building internal tools and prototypes. A person with a clear idea, even without mastering a specific framework, can rely on the agent to generate the foundation of the projectReview options and adjust the result through successive iterations. However, the final responsibility for aspects such as safety, maintenance, and legal compliance still rests with qualified professionals.

OpenAI insists that GPT-5.3-Codex should not be understood as a direct replacement for developers, but rather as a much more capable co-pilotThe model can speed up routine work, but it still makes mistakes, needs supervision, and can sometimes propose solutions that work technically but do not fit the project objectives or the actual limitations of the environment where it will be deployed.

For teams in Spain and the rest of Europe, where many technology companies are gradually adopting AI, this type of agent also raises organizational questions: how to divide tasks, what part of the code is left to the model, what review processes are implemented, and how to document what the AI ​​does to avoid technical debt in the medium term.

With all this context, GPT-5.3-Codex is emerging as a central piece in OpenAI's software development strategy: a model that combines speed, reasoning, and agentic capabilitieswhich has helped build its own version and aims to integrate into the daily work of programming teams and professionals who work daily in front of a computer. Its real impact will depend on how it is adopted in specific projects, the safeguards that are applied, and the extent to which European users feel comfortable incorporating such a powerful tool into processes that, until now, were exclusively human.

Related article:
You can now create your own Pokémon RPG adventure

Follow us on Google News