Snippets and thoughts from a passionate software developer interested in all things DevOps, Continuous Delivery, Functional Programming, Distributed Systems. Also on Twitter as @mcallana.
Thursday, February 27, 2014
How Etsy do Code Reviews with Continuous Delivery (incl. branching, culture)
Continuous Delivery strongly advocates developing on mainline/master/trunk. It also advocates releasing as frequently as possible, pointing out that Continuous Deployment, where all commits that pass all the stages of the build pipeline are automatically released, is the ultimate logical extension. But how do teams have a chance to perform code reviews if all commits to master are potential release candidates? This question was posed on the Continuous Delivery Google Group recently, with a response from Etsy's Mike Brittain about how they do it. TL;DR: It involves temporary short-lived branches for reviews and a culture: of keeping changes small, not deploying without review & being good at assessing risk of releasing untested code.
Are code reviews mandatory?Any formal process or tool that "gates" check-ins to trunk, based on peer review?
Code review is not mandatory, nor does it gate check-ins to trunk or deployments to production.
Some of the hangups we have about gating deploys in this fashion is that gating artificially slows us down emergencies/outages, and as Jez put it in the topic you linked to, gating assumes stupidity that has to be corrected for.
We have built the culture in our Engineering team to assume everyone is going to do the right thing, and we should trust them. When that trust falls apart, I think of it as a bug in our hiring and on boarding processes.
It is socially unacceptable at Etsy to deploy without some form of review, and that's true for nearly every sub-team.
Many of our reviews happen in GitHub (Enterprise), and so have an audit trail attached to them of who reviewed and what was discussed.
Other code reviews might be a simple copy-and-paste of a patch file, or an over the shoulder review by a peer.
Most changes are relatively small and can be easily grokked
With respect to config flags (feature switches) and other *new* code:
It's possible that we'll deploy code to our production servers that is not live (e.g. "dark code"), but that a more formal code review will happen later when all of the changes for some new feature or product are complete.
Maybe this is just a side effect that we don't use feature branches, and that we think of committing to git as deploying to web servers (they happen nearly in sync).
That doesn't mean that the code will be live to a public user. (Think: new class files, or new and unlinked JS or CSS, features that are gated off so that they are only available to staff members).
Likely that any change which would negatively impact customers in the short-term will be caught by automated tests?
I'd argue that there are a number of changes that could negatively impact consumers or, more to the point, their behavior, that would never be caught by automated tests.
These are the types of things you might think of for classic A/B tests—size, shape, position, color of UI elements, feature discoverability, fall-off in multi-page flows.
And that is why we move quickly to get these changes to the (A/B) testing phase.
This statement seems to imply that automated tests are the only way to catch bugs. We find a lot of bugs through production monitoring that we would have never conceived of writing tests for.
We have the benefit of operating our own products - not dealing with contractual obligations to deliver "bug-free" code to clients.
We're deeply immersed in the trade-offs of testing vs. speed, and I think that's well understood by many engineers on the team who are making decisions around those trade-offs every day.
We regularly make mistakes at this, identify the mistakes through open discussion, and those engineers who were involved come away more experienced and wiser for it.
Assumption of "under-reviewed code"?
Many cases where we deploy code, even publicly, that you would consider to be "under-reviewed"
The relative degree of testing that we apply is proportional to the amount of risk involved.
We are decent at assessing and socializing risk :)
Our Payments team, for example, is much more rigid about their review process than teams that, say, are running experiments at the UI level.
It's not one-size-fits-all.
You mentioned that you don't use feature branches, but also mentioned that most of your reviews are done in Github Enterprise?
What you're pointing out is some subtlety around how we refer to branches and reviews.
In terms of development process, we don't use feature branches in the traditional sense—especially if you think of branches existing for more than a day.
In practice, there are certainly cases where individuals will use a short-lived branch to isolate a couple of things they're working on at a time, much like you might use git stash—such as to switch context to work on an immediate bug fix that needs to go to production.
Our review process does utilize branches but only to support the review process.
As an engineer prepares to commit a change set, they create a (short-lived) review branch in git.
This allows us to look at the review in GitHub Enterprise and use their commenting and visual diff'ing.
It also allows other engineers to checkout the branch and put it through its paces, when necessary.
Once the code review is complete, the branch is typically auto-merged in GitHub when the engineer is prepared to deploy it.
Yes, it's a subtle (and perhaps stubborn) distinction. :)
It seems that "time to review" would be a fairly key metric in that case
This is true, but it's not really a problem.
Daily stand-ups generally work for socializing if you're blocked on someone's feedback on a code review
For your review process, would you say you're using a variant of what GitHub calls "GitHub Flow"?