Introduction
Recently, our team started using GitHub Copilot, and everyone eagerly tried it out in various scenarios to see how much workload it could save us. However, during the trial process, someone raised a question: How should we do Test-Driven Development (TDD) while using GitHub Copilot? Or should we even still need TDD?
To answer this question, we explored several different approaches and evaluated each one based on the following three dimensions:
- Development Experience: The fluency of the developer’s workflow and cognitive load.
- Task Granularity: How finely tasks must be broken down to suit GitHub Copilot’s execution capabilities.
- Task Completion Rate: The quality of the code generated by GitHub Copilot and the frequency of manual adjustments needed.
This article will detail these experiments and their evaluation results, and discuss the role and future of TDD under the support of AI coding assistants.
Attempted Solutions
In the process of exploring how to combine GitHub Copilot with TDD, we uniformly used the following requirement scenario as the test basis:
Requirement Description
**User Story:**
As a user, I want to store my bag in a locker and receive a ticket so that I can securely keep my belongings and retrieve them later using the ticket.
**Acceptance Criteria:**
1. Given the user has placed their bag in the locker,
When they close the locker door,
Then they should receive a ticket with a unique identifier.
2. Given the user has a valid ticket,
When they scan or enter the ticket at the locker station,
Then the corresponding locker should unlock, allowing them to retrieve their bag.
3. Given the user attempts to retrieve their bag without a valid ticket,
When they try to unlock the locker,
Then they should see an error message indicating that access is denied.
4. Given all lockers are currently occupied,
When a user tries to store their bag,
Then they should see a notification that the lockers are unavailable and be advised to try again later.
5. Given the locker system tracks capacity,
When a locker becomes available after retrieval,
Then the system should update its status to allow new bags to be stored.
Based on the above requirements, we tried the following main approaches and evaluated their effectiveness.
1. Traditional TDD Development Method + Code Completion
In this approach, we still followed the traditional TDD development flow, but leveraged GitHub Copilot’s code completion functionality to automatically generate parts of the content when writing code.
Evaluation
- Development Experience: Low fluency. Developers’ thought processes need to frequently switch between manually writing code and waiting for code generation, which easily distracts attention. When the quality of the generated code was low, it caused frustration and wasted time.
- Task Granularity: Task decomposition granularity was fine, belonging to line-level granularity. Developers needed to constantly adjust code, lacking clear task boundaries and breaks.
- Task Completion Rate: Lowest. The frequency of manual modifications to the generated code was highest, resulting in low efficiency.
2. Traditional TDD Development Method + Comments + Code Completion
In this approach, in addition to using the code completion feature, we added comments to guide GitHub Copilot to generate more accurate and larger-grained code.
Evaluation:
- Development Experience: Moderate fluency. Since comments are more flexible than code, developers could focus on describing tasks, reducing concerns about syntax or formatting, thereby lowering cognitive load.
- Task Granularity: Task decomposition granularity was moderate, belonging to function-level. For cross-function tasks, manual code adjustments were sometimes still necessary.
- Task Completion Rate: Moderate. By adjusting comments to regenerate code, the frequency of manual modifications decreased, but some intervention was still required.
3. Traditional TDD Development Method + Edit + Prompts
This approach utilized GitHub Copilot’s Edit mode. We wrote each step of TDD as prompts to generate code that met the requirements.
Evaluation:
- Development Experience: Higher fluency. Developers focused on writing prompts and reviewing code, further reducing cognitive load. However, tests and refactoring still required separate prompt writing.
- Task Granularity: Task decomposition granularity was larger, belonging to development process-level. Generated tests no longer focused on single test cases but covered multiple scenarios.
- Task Completion Rate: Higher. By modifying prompts or regenerating code, the frequency of manual code edits was reduced.
4. Agent + TDD Prompts
Finally, we tried GitHub Copilot’s Agent mode combined with TDD prompts for development. The content of the prompts was as follows:
# User Task Breakdown
Decompose high-level software requirements into discrete, testable functionalities and implement them using the Test-Driven Development (TDD) methodology. For each functionality, follow these steps:
1. **Write Focused Tests**: Create precise unit tests for a single functionality or requirement, ensuring coverage of all possible scenarios, edge cases, and invalid inputs.
2. **Confirm Test Failure**: Execute the tests to verify they fail initially, confirming their validity before implementation begins.
3. **Implement Minimal Code**: Write the simplest code required to pass the tests, avoiding over-engineering or adding features not directly related to the current test cases.
4. **Verify Implementation**: Re-run the tests to confirm that the implemented code passes all test cases successfully. Debug and refine as necessary.
5. **Refactor**: Improve the code’s structure, readability, and performance while maintaining functionality, ensuring no tests break during the process.
6. **Validate Refactoring**: Run the tests again after refactoring to ensure the updated code still passes all test cases without introducing regressions.
We placed the above prompts in the file .github/copilot-instructions.md
and used Agent mode for development.
Evaluation:
- Development Experience: High fluency. Developers focused on writing prompts and reviewing code, eliminating the need to write separate prompts for tests and refactoring, significantly reducing cognitive load.
- Task Granularity: Task decomposition granularity was largest, belonging to functionality level. Developers did not need to worry about development process details, focusing only on task decomposition and code review.
- Task Completion Rate: Highest. Before running tests, code adjustments could be made through GitHub Copilot Chat or manually, with minimal frequency of manual code edits. However, it is worth noting that Agent mode occasionally deviated from the TDD development model, possibly due to the design of the prompts.
Do We Still Need TDD?
Traditional TDD is a practice that requires human maintenance, often being neglected or abandoned for various reasons. However, from the above attempts, it is evident that we can fully embed the TDD practice flow into GitHub Copilot in the form of prompts and maintain it in the code repository. This way, all team members can adopt a consistent development flow.
From this perspective, TDD itself seems no longer needs to be emphasized separately. This does not mean that the importance of TDD has diminished, but rather that AI coding assistants have given us the possibility to automate TDD. TDD is no longer a skill that requires a lot of time and effort to train and master, nor is it an optional item in development, but a standard process automated into daily work. Developers can focus more effort on understanding requirements, task breakdown, and code review.
This transformation is similar to the development history of Continuous Integration (CI). Before CI, developers had to manually compile and test locally and remember best practices. However, such processes couldn’t be enforced. With the advent of CI tools, these practices were solidified into scripts, freeing developers from spending extra effort on them, allowing them to focus on other more important work. Meanwhile, the likelihood of omissions or errors also greatly decreased.
Therefore, the role of TDD is transforming from a practice driven by human efforts to a flow automated by AI coding assistants. This not only improves development efficiency but also ensures higher code quality.
Conclusion
TDD is just one example among many traditional development practices that can be automated. By embedding the team’s best practices into AI coding assistants in the form of prompts, we can achieve a higher degree of standardization and consistency. AI coding assistants act like junior developers with rich knowledge but less experience; through carefully designed prompts, we can standardize their behavior, providing team members with a consistent usage experience.
This approach not only helps enhance team members’ capabilities but also optimizes the development process. In the future, with further advancements in AI tools, we might be able to automate more complex development practices, thereby unleashing developers’ creativity and enabling them to focus on solving more challenging problems.
Let’s embrace the power of AI and redefine the best practices of software development based on automation. This is not only a technological advancement but also a profound transformation in the development paradigm.
Comments