Velocity 2009
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr — Presentation Transcript
- 10 deploys per day Dev & ops cooperation at Flickr John Allspaw & Paul Hammond Velocity 2009
- 3 billion photos 40,000 photos per second
- Dev versus Ops
- “It’s not my machines, it’s your code!”
- “It’s not my code, it’s your machines!”
- Spock Scotty Little bit weird Pulls levers & turns knobs Sits closer to the boss Easily excited Thinks too hard Yells a lot in emergencies
- Says “No” all the time Afraid that new fangled things will break the site Fingerpointy
- Ops stereotype Because the site breaks unexpectedly Because no one tells them anything Because They say “NO” all the time
- Traditional thinking Dev’s job is to add new features Ops’ job is to keep the site stable and fast
- Ops’ job is NOT to keep the site stable and fast
- Ops’ job is to enable the business (this is dev’s job too)
- The business requires change
- But change is the root cause of most outages!
- Discourage change in the interests of stability or Allow change to happen as often as it needs to
- Lowering risk of change through tools and culture
- Dev and Ops
- Ops who think like devs Devs who think like ops
- “But that’s me!”
- You can always think more like them
Tools
- 1. Automated infrastructure
- If there is only one thing you do…
- CFengine Chef BCfg2 FAI System Imager Puppet Cobbler
- Role & configuration management OS imaging
- 2. Shared version control
- Everyone knows where to look
- 3. One step build and deploy
- [2009-06-22 16:03:57] [harmes] site deployed (changes...) Who? When? What?
- Small frequent changes
- 4. Feature Tags (aka branching in code)
- 1.0.1 1.0.2 1.0 1.1 1.2 1.1.1 Desktop software
- r2301 r2302 r2306 Web software
- Always ship trunk
- Everyone knows exactly where to look
- Feature ?ags #php if ($cfg['enable_feature_video']){ … } {* smarty *} {if $cfg.enable_feature_beehive} … {/if}
- Private betas
- Bucket testing
- Dark launches
- Start fetching data behind the scenes before turning feature on
- Application ignores data
- Free contingency switches
- 5. Shared metrics
- Application level metrics
- Ganglia - devs + ops have access
- Adaptive feedback loops RU ok? App System Metrics maybe?
- 6. IRC and IM robots
- Dev, Ops, and Robots Having a conversation build deploy logs logs alerts monitors IRC search engine
Culture
- 1. Respect If there is only one thing you do…
- Don’t stereotype (not all developers are lazy)
- Respect other people’s expertise, opinions and responsibilities
- Don’t just say “No”
- Don’t hide things
- Developers: Talk to ops about the impact of your code:
- • what metrics will change, and how?
- • what are the risks?
- • what are the signs that something is going wrong?
- • what are the contingencies?
- This means you need to work this out before talking to ops
- 2. Trust
- Ops needs to trust dev to involve them on feature discussions
- Dev needs to trust ops to discuss infrastructure changes
- Everyone needs to trust that everyone else is doing their best for the business
- Shared runbooks & escalation plans
- Provide knobs and levers
- Ops: Be transparent, give devs (read-only) access to systems
- 3. Healthy attitude about failure
- Failure will happen
- If you think you can prevent failure then you aren’t developing your ability to respond
- Pilots spend time in simulators when they're not flying
- You want the EMT that deals with heart attacks every week not once per year
- Fire drills
- 4. Avoiding Blame
- No fingerpointing
- Developers: Remember that someone else will probably get woken up when your code breaks
- Ops: provide constructive feedback on current aches and pains
- 1. Automated infrastructure 2. Shared version control 3. One step build and deploy 4. Feature Tags 5. Shared metrics 6. IRC and IM robots 1. Respect 2. Trust 3. Healthy attitude about failure 4. Avoiding Blame
- This is not easy You could just carry on shouting at each other…