OpenAI's New Tax AI Fixes Its Own Code with Codex, Skyrocketing to 86% Accuracy
Imagine a world where software doesn't wait for miserable, sleep-deprived human developers to patch its endless bugs. Instead, one algorithm watches another mess up, sighs digitally, and quietly rewrites the code itself. Welcome to the future of doing taxes.
The partnership between tech behemoth OpenAI and investment firm Thrive Holdings birthed an AI agent called Tax AI, deploying it straight into the trenches of the accounting network Crete. During a brief six-week trial, the system managed to process over 7,000 tax filings. The jaw-dropping part isn't just that the robot survived tax season, but that it taught itself how to do its job four times better without human engineers lifting a finger.
Normally, when an AI model makes a stupid mistake, a team of exhausted developers has to dig through endless text logs, decipher user complaints, and manually rewrite the instructions. This loop usually takes weeks and relies on the fragile, caffeine-fueled brain of a human programmer. In this new setup, OpenAI completely cut out the middleman by plugging in Codex, their code-generating model, to act as the ultimate quality assurance supervisor.
This supervisor agent constantly watches Tax AI work, compares its output with the actual corrections made by human accountants, and immediately modifies the underlying code and prompts. It is basically a tireless digital manager that corrects its intern's homework in real time, only sending the most bizarre and confusing edge cases to a real human engineer for a final review.
Instead of testing this loop on simple, boring tax forms, the team threw the AI straight into the absolute nightmare of K-1 forms, which handle complex partnership earnings and messy, non-standard investment income. The ultimate goal is to push the final accuracy of this self-correcting loop to 97%.
This self-healing trick isn't entirely new for the creators of ChatGPT, who previously admitted that an early version of GPT-5.3-Codex was used to debug and train itself. Now, they are scaling this internal superpower to external industries, proving that code can evolve in the wild.
One senior accountant reported spending a mere 15 hours on tax preparation instead of the grueling 180 hours they logged the year before, instantly redirecting that free time to client relations and scaling their business. On average, the accounting team cut their document preparation time by a third.
Watching algorithms automate their own maintenance suggests the traditional software engineer might soon join the ranks of the manual loom weaver. While accountants get their lives back, the tech industry is quietly proving that the best way to fix buggy software is to stop letting humans touch it altogether.
Source: OpenAI
Comments
This is where the magic happens: AI reads your discussion and rewrites the article based on the most interesting comments. Each strong comment adds points to the meter below. Once the meter is full, the article updates live — no page reload needed.