Collaborative Code Review
It occurred to me that despite the number of times I’ve been asked to talk about what my Masters thesis work is all about, I’ve never really bothered to sit down and describe it outside of my own personal notes and drafts. So I’m going to talk a bit about the motivations and ideas behind what I’m doing, since the actual technical side isn’t particularly interesting.
The idea of code review is a very simple one: if you have someone else look over your code, they can probably help you make it better. Many software companies incorporate this into their development workflow, and it is not uncommon for code review to be mandatory before new code is accepted. While this works well in industry (not to say that it is without faults), it relies on the assumption that every developer writing code is also a code reviewer. If all code gets a second set of eyes and you assume that reviewing code is a much less time-consuming task than writing it in the first place, then the overhead of mandatory code review is fairly palatable.
When we look instead at teaching a moderately large programming class with a significant amount of programming work, it becomes essentially impossible for the teaching staff to look over student code with any amount of detail beyond superficial correctness, code style, and maybe some high-level concepts. While this is depressing, it is also inevitable when you look at the numbers (we’ll use some approximate numbers from a recent iteration of MIT’s 6.005: Elements of Software Construction):
100 students × 3 projects × 3000 lines per project / 3 students per group
= 300,000 lines of code
300,000 lines of code / 4 TAs / 15 weeks per semester
= 5,000 lines of code per TA per week
I’m not comfortable with the idea of looking at 1,000 lines of code per day as a TA. That’s not even including the smaller problem sets we had throughout semester.
How can we make sure that student code gets looked over and corrected?
One approach is to bring in more people by bringing the students into the review process. This has three advantages: reducing the load on the teaching staff, teaching the students how to review code, and exposing students to each other’s code and comments. It is easy to imagine a class where after an assignment is due, each student is then given the task of reviewing some amount of code written by classmates, after which the feedback would be collected and presented to the students for their own benefit, and also to the teaching staff to facilitate the grading process.
While existing review systems are built for small teams of experienced developers whose primary goal is to improve code quality, our problem is fundamentally different. A classroom code review tool needs to be built for a large group of students of varying levels together with a small group of (hopefully experienced) teaching staff. Furthermore, while code quality is important, the primary goal in a classroom is to teach the students how to be better software engineers. By designing a few key features into the system with the characteristics of our reviewer population in mind, we can accelerate and improve both the quality of the code review and the quality of the student learning that result:
- Code partitioning – Because it would be unfeasible to simply distribute an entire project to a student for review, student code needs to be partitioned into small “chunks”, perhaps on the order of 50 lines each, to distribute to peers.
- Discussion support – Users can reply and vote on existing comments, allowing students to directly communicate with each other in the context of code.
- Reputation system – Users are rewarded for high-quality comments, incentivizing participation. This can be in the form of “karma” points, achievements or badges, or both.
The other approach is to bring in better tools to help streamline and automate the review process as much as possible. This is something that doesn’t quite exist yet in the wild. Our current state-of-the-art tools to ensuring code quality fall largely into two categories: completely automated systems like testing frameworks and static analysis, and completely user-driven tools like existing code review systems (Gerrit, Rietveld, Review Board).
Ideally, our tool would sit between these two extremes and aim to increase the efficiency and quality of the code review by automating tedious work and gathering relevant data. Our users are experience-rich and time-poor, so the it is the primary responsibility of the tool to decide how best to target their attention.
- Automatically generated feedback – Existing static analysis tools (Checkstyle, FindBugs) can often generate many of the same comments that human reviewers would otherwise leave. Why not add the output of those tools into the same interface that human reviewers use and allow users to upvote/downvote/reply to them.
- Clustering similar code – If a piece of code is unfamiliar to the reviewer, it could be extremely helpful to look at a piece of similar code somewhere else in the system that has already been reviewed and annotated.
In short, my thesis work is to improve both the code review process itself and software engineering education by adding students (and others) into the review process and giving them an intelligent, easy-to-use tool to make that process as easy and streamlined as possible.