TL;DR: Key Takeaways from the Experiment
- AI-generated code functions but often lacks structure and clarity.
- Implementing Test-Driven Development (TDD) enhances readability, maintainability, and modularity.
- Even AI exhibits caution in refactoring without tests, mirroring human developer behaviour.
- This experiment indicates that while AI can produce functional code, applying TDD significantly improves its quality.
Let’s delve into the details.
The Experiment: Can AI Write Good Code?
Curious about AI's ability to generate quality code, I tasked ChatGPT with creating a PHP plugin - a language and tech stack I'm not particularly familiar with. The goal was to evaluate the viability of AI-generated code.
Initial Observations:
- The code didn't work on the first attempt.
- Debugging took longer than anticipated.
- The code was difficult to comprehend.
- It consisted of a large, monolithic class with bloated methods.
These challenges prompted me to consider how AI-generated code would fare in a structured coding exercise, such as a simple Kata.
Questions Raised:
- What would the AI-generated code look like?
- Would it resemble code developed using TDD principles?
- Would it exhibit common issues seen in code from developers new to TDD- functional but riddled with code smells?
In coding dojos, when problems are solved by participants without TDD, the resulting code often works but lacks cleanliness. Even in simple Katas, issues like code smells, duplication, and poor structure are evident.
To explore this, I conducted a quick experiment to see how it would perform with a simple exercise.
Step 1: AI-Generated Code Without TDD – Identifying Issues
I prompted ChatGPT:
"We're running a coding dojo in C#, focusing on the Prime Factors Kata. Write a static method generate(int N) that returns a list of prime factors of N in ascending order"
ChatGPT’s Initial Code:

Issues Identified:
- The function works as intended.
- No accompanying tests.
- Contains unnecessary duplication.
- Utilizes overly complex loops.
While the function solves the problem, it's not crafted in a test-driven, clean, or maintainable manner.
Step 2: Applying Test-Driven Development (TDD) to AI-Generated Code
Next, I instructed the AI:
"Write the code, this time using a TDD approach"
This time, the approach differed.
AI's TDD Approach:
- Took small, incremental steps.
- Wrote failing tests first.
- Implemented minimal code to pass each test.
- Notably hesitant to refactor without tests.
AI’s TDD-Driven Final Code:

Observed Improvements:
- Tests now drive the implementation.
- Improved structure with reduced complexity.
- Some duplication in handling numbers persists.
At first glance, AI appeared to follow TDD. However, this raised a critical question:
Did AI actually apply TDD principles, or did it simply execute predefined steps without understanding their purpose?
While AI wrote tests first and iterated, it did not fully embrace the incremental nature of TDD. It skipped essential baby steps - the small, guiding tests that are crucial for evolving a solution step by step.
Instead of progressing gradually, AI jumped ahead, testing larger cases (e.g., 30, 56, 97, and 101) before validating foundational ones like 3, 6, or 9. A true test-driven approach should encourage small, controlled iterations, ensuring that each step builds upon the last while maintaining clarity and correctness.

TDD vs. AI: Is AI Truly Applying Test-Driven Development?
At first glance, AI appears to follow the TDD process - writing tests, implementing code, and iterating. However, does AI truly understand TDD, or is it simply executing a set of predefined steps?
Here are key observations questioning whether AI genuinely understands the principles behind TDD:
- Delayed Refactoring – AI only refactors after completing the implementation rather than continuously improving the code with each test cycle, as TDD encourages.
- Lack of Clean Code Principles – AI doesn’t naturally apply software craftsmanship principles like meaningful naming, single responsibility, or reducing duplication—unless explicitly instructed.
- Skipping Essential Steps – AI often jumps to complex solutions too quickly, skipping small, incremental steps in the TDD cycle. Instead of testing for 2, then 3, then 9 before 30, it jumped from 4 to 40, violating the Transformation Priority Premises.
- Granularity of TDD Steps – While AI does iterate, it doesn’t always take the smallest possible step to evolve the implementation, potentially missing key learning moments.
The Transformation Priority Premises (as defined by Robert C. Martin) outline a structured approach to incrementally evolving code. The idea is to introduce the simplest possible change before moving to a more complex one, ensuring code remains clean, adaptable, and well-tested throughout development.
AI violated these principles by making larger leaps in logic, skipping over critical learning and validation steps that would normally drive incremental improvement in a true TDD approach.
Building High-Performing Teams
This experiment was insightful but also raised concerns. Even with AI’s capabilities, code quality is never guaranteed - it can reinforce good practices or amplify bad ones. AI can help developers build products faster, but at what cost?
Perhaps this highlights why test-first and quality-first thinking is more crucial than ever. This approach that ensures software remains adaptable, maintainable, and valuable.
Anyone can write code - heck, even my gran could. But not everyone can write high-quality code that’s extensible, maintainable, and resilient over time. That’s why coding is more than just writing code. It’s about aligning with engineering and quality principles from Extreme Programming (XP) and software craftsmanship, enabling you to create software products that continuously evolve with the market and deliver value to your customers
Even with AI’s ability to generate functional code, would you trust it to build mission-critical systems, like the software in modern cars that receive regular updates from manufacturers, without human oversight, test validation, and application of quality principles?
This experiment reinforces that while AI is a powerful tool, it currently lacks the judgment, intentionality, and expertise found in true software craftsmanship and XP.
As we coach teams on what "good" looks like, AI may help accelerate the journey toward high-quality products. But for now, it still takes human expertise to ensure we're crafting scalable, adaptable software that truly serves customers.
Final Thought: See TDD in Action (Without AI!)
While this experiment revealed interesting insights into AI’s capabilities, there’s no substitute for hands-on experience.
Want to see TDD in action? Watch our video demonstration where we take a test-driven approach to solving the Primefactor Kata - without AI.
Key Takeaways from Exploring TDD
- TDD isn’t just about writing tests.
- It’s about designing better software.
- It’s about learning to take small, safe steps.
We ran the experiment again, this time prompting AI with transformation premises and small incremental test steps before rerunning the dojo prompt. The result? AI’s response was noticeably closer to what we would write.
However, just because AI got it right in the end doesn’t mean the code was high quality from the start. It only improved with additional guidance, highlighting that even minimal principle-based checks are essential for producing clean, maintainable code.
If you're looking to strengthen your technical practices and build high-quality software, our Technical Excellence Program helps teams embed sustainable development practices. The Applying Professional Scrum for Software Development (APS-SD) also provides hands-on experience in applying Agile, DevOps, and technical excellence practices within Scrum teams.
At the end of the day, AI is a powerful tool, but it’s how we apply our expertise that truly makes the difference. 🚀
Your Thoughts?
- Have you used AI to assist with your coding? If so, how?
- Would you trust AI to generate code based solely on written instructions?
- Have you ever hesitated to refactor code in the absence of tests? Would you trust AI to change code without altering its behaviour?