PuppetConf 2013 was held on 22nd & 23rd August. I tuned in to the live streaming and took the notes below.

Full list of talks with links to pages with video and slides: http://puppetlabs.com/resources/puppetconf-2013
Full playlist of videos: http://www.youtube.com/playlist?list=PLV86BgbREluU02Ytlz80seDSKAbkx5pRg

Thursday 2013-08-23 [Friday BNE time]

9:40am [2:40am] • Keynote: Why Did We Think Large Scale Distributed Systems Would be Easy? – Gordon Rowell - Google
10:15am [3:15am] • Keynote: Open Sourcing the Cloud – Brian Stevens - RedHat
11:10am [4:10am] • How Do We Better Sell DevOps? – Gene Kim
1:30pm [6:30am] • Nobody Has To Die Today: Keeping The Peace With The Other Meat Sacks – Mykel Alvis
2:20pm [7:20am] • Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet – Bess Sadler - Stanford
3:10pm [8:10am] • So You've Got Scalability. Now What? – Carla Souza – Reliant
4:20pm [9:20am] • Multi-Provider Vagrant: AWS, VMware, and More – Mitchell Hashimoto
5:10pm [10:10am] • Building Data-Driven Infrastructure with Puppet – James Fryman - GitHub

Keynote: Why Did We Think Large Scale Distributed Systems Would be Easy?

http://puppetlabs.com/presentations/keynote-why-did-we-think-large-scale-distributed-systems-would-be-easy

Google's Corporate Engineering SRE team provides infrastructure services used by many of Google's desktops, laptops and servers. This talk gives an overview of the design philosophy, challenges, technologies and some interesting failures seen while implementing infrastructure at scale.

Gordon Rowell

Site Reliability Manager, Google

Notes:

Post-mortems not about blame – lots to leran from medical community
Thundering herds: Randomise cron jobs
Anycast
- Helps you be consistent
- Traffic could go anywhere
- Can I handle any type of query?
- IP address injection in route table
- TCP or UDP
- Won’t maintain state
Cascading failures far worse than dropping traffic
Diversity for people – not for platforms
If you can’t automate it, you shouldn’t do it

Q: DNS Round-Robin depend on client?

SNMP

Canary
- Dev, Test, Stage envs for devs
- Turn some machines into canary machines
- Full cluster of canaries
- Then Rollout
How to round-robin DB backends?
- Typically local LB
- Not anycast
- Tricky
- Depends how persistent
- Inter-transaction consistency
- No good/complete answer

Keynote: Open Sourcing the Cloud

http://puppetlabs.com/presentations/keynote-open-sourcing-cloud

Red Hat is putting serious emphasis on cloud computing – with the goal of building agile infrastructure and platform clouds, which can be used to free developers and IT to do great things, faster. Brian will talk about how Red Hat’s “all-in” technology investments will help make this happen; including the external, upstream open source development model for RHEL, the Red Hat OpenStack community and the elasticity of deploying Openshift on top of Openstack.

Brian Stevens

CTO and VP, Worldwide Engineering, Red Hat

Notes:

Mantra: Upstream First
Open Daylight: Open-source network-controlled network
- Decouple application workflows from infrastructure workflows
- Network virtualisation – similar to server viritualisation
- Neutron ties together OpenStack and OpenDaylight
CI in OpenShift
- Clone production service
- Jenkins cartridge
- Easy
- Don’t have to modify app, but will change the way you work

How Do We Better Sell DevOps?

In this talk, I will share my top lessons learned over my years studying high performing IT organizations on how to sell the value of DevOps, and help other stakeholders and executives have their own a-ha moments. I will talk about specific stories about the circumstances that led to these a-ha moments, how they created DevOps champions in surprising places (e.g., Development, CTOs, Product Management, UX, Infosec) in organizations you'll recognize, and how they enabled implementing DevOps patterns that had awesome results.

Gene Kim

Author "The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win", IT Revolution Press

Nobody Has To Die Today: Keeping The Peace With The Other Meat Sacks

A frank (and, frankly, loud) discussion about the kinds of miscommunication that arise between developers and operations, how it leads to trouble and possible ways we can avoid (figurative) violence in the workplace using both social techniques as well as tooling.

Mykel Alvis

Sr. DevOps Consultant, MomentumSI

Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet

http://puppetlabs.com/presentations/vampires-vs-werewolves-ending-war-between-developers-and-sysadmins-puppet

Developers need to be able to write software and deploy it, and often require cutting edge software tools and system libraries. Sysadmins are charged with maintaining stability in the production environment, and so are often resistant to rapid upgrade cycles. This has traditionally pitted us against each other, but it doesn't have to be that way. Using tools like puppet for maintaining and testing server configuration, nagios for monitoring, and jenkins for continuous code integration, Stanford University Library has brokered a peace that has given us the ability to maintain a stable production environment with a rapid upgrade cycle. I'll discuss the individual tools, our server configuration, and the social engineering that got us here.

Bess Sadler

Manager for Application Development, Stanford University Library

Notes:

You don’t take risks with ppl you don’t trust
- Need to start with building interpersonal relationships, building trust
- Let go of the anger
- Recognise common goals
Tactics
- Get to know the people on the other side, take them out for coffee
- Show of good faith, show them your test coverage
- Monitoring with e.g. Nagios
- Goal: Are all of our projects functioning correctly right now?
- Friendly Manual
- Puppet: the ultimate challenge in vampire-werewolf relationships
http://www.codinghorror.com/blog/2010/08/vampires-programmers-versus-werewolves-sysadmins.html
- The art of managing vampires and werewolves, I think, is to ensure that they spend their time not fighting amongst themselves, but instead, using those supernatural powers together to achieve a common goal they could not otherwise. In my experience, when programmers and system administrators fight, it's because they're bored. You haven't given them a sufficiently daunting task, one that requires the full combined use of their unique skills to achieve.
- Remember, it's not vampires versus werewolves. It's vampires and werewolves.

So You've Got Scalability. Now What?

http://puppetlabs.com/presentations/so-youve-got-scalability-now-what

Managing over 10k nodes brings unique challenges, one of them is managing all data in a scalable way, but solving the scalability issue isn't enough. The data must be available and manageable in a user friendly way. This talk is about how we successfully implemented a solution using Hiera, Redis, Sensu, Rails and Grape that made us capable of providing our customers with the ability to not only manage their own data but also build their own applications to manage their infrastructure using our API.

Carla Souza

Senior Engineer, Reliant Security

Notes:

https://github.com/reliantsecurity/hiera-resources

Multi-Provider Vagrant: AWS, VMware, and More

http://puppetlabs.com/presentations/multi-provider-vagrant-aws-vmware-and-more

With Vagrant 1.1+, you can use the same configuration and workflow to spin up and provision machines in VirtualBox, VMware, AWS, RackSpace, and more. You get all the benefits of Vagrant with the power of working in whatever environment you need to. This capability unlocks entirely new use cases for Vagrant that can help better optimize the entire process of developing and testing Puppet code. In this talk, you'll learn how about the new multi-provider features, why they exist, and how they can be used. Your life will never be the same again.

Mitchell Hashimoto

Founder, HashiCorp

Notes:

Packer
- If vagrant up becomes a blocker due to length of time Packer becomes important
- Multi-cloud portability e.g. Dev->Test->Prod
- Stability and portability:
  - E.g. Provision image with Puppet, don’t run Puppet live.. works after 12months
- Handles reboots

Building Data-Driven Infrastructure with Puppet

http://puppetlabs.com/presentations/building-data-driven-infrastructure-puppet

As your Puppet Infrastructure grows, so does the complexity of the Puppet codebase. The complexity of the codebase often creates a scenario where it becomes more time consuming to modify/add to the codebase. Likewise, any new addition or node still may require modifications to the Puppet database, which could include the management of many edge cases. Fortunately, the software industry has been working on developing techniques with code abstraction, refactoring, and software maturity. This talk will focus on how to write scalable modules within Puppet to be used to create Data Driven Infrastructures. In addition, this talk will demonstrate how to structure process/procedure/code to quickly and rapidly scale operations with minimal modifications to Puppet code.

James Fryman

Operations Hacker, GitHub, Inc.

Notes:

https://gist.github.com/jfryman/6310477
- James White Manifesto
- ChatOps
- check-graphite
- HAProxy Snippets
- Refactoring Puppet
- CloudFormation
- Autoloader
- Syslog-NG Model
- NagiosDB
- Bio
Beer-Ops
James White Manifesto
Machine parseable: every input and output should be readable by computer
There is only one system
Systems-thinking
gPanel -> PuppetDB
gPanel self-driven portal CMDB
Machine readable metrics can go right back into the system
Unix Philosophy – apps do one thing really well
Controller
- Think about Puppet as a “controller”
- https://speakerdeck.com/jfryman/refactoring-puppet
- Facter is meant to be info introspection but can cheat and use it as a state engine when you have a controller
Orchestrator
- Chat Ops – Jesse Newland talk
- mcollective
- Puppet is a state machine
- Don’t want to wait for entire Puppet convergence
Metrics
System must be able to self-correct
- check-graphite
Deployable using text files
Modularity
To stop config drift: Level up from Templates to Data Driven
NagiosDB
- Dynamically generate Nagios setups outside of Puppet
System should introspect itself
Refactoring Puppet – least to most specific, modules become systems
CloudFormation
Autoloading
- Nagios-autoload: https://gist.github.com/jfryman/6310732
Modelling
- Puppet has to know about app intimately
- Augeas allows you to build infra dynamically
- Pass structured data into module
- “Turntup” slide
Fencing resources
- Being built but not ready to serve
Constant introspection, feedback loop into the system
What’s missing?
- We don’t have a good language for complex actions
- Need a way to model complex interactions if-then-else and ngaios event handlers only gets you so far
- Predictive Analysis
Data is coming from itself, think about system/feedback loop
Only one system
- Everything machine parsable
- Must be thinking “how do we make computers do my job?”

Friday 2013-08-24 [Saturday BNE time]

Keynote: Stop Hiring Devops Experts (And Start Growing Them)

http://puppetlabs.com/presentations/keynote-stop-hiring-devops-experts-and-start-growing-them

Everyone is putting "devops" on their LinkedIn profile, and everyone is trying to hire them. In this talk, Jez will argue this is not a recruitment problem but an organizations failure. This talk discusses how to grow great people and great organizations, and how the two problems are connected.

Jez Humble

Principal, ThoughtWorks

Notes:

http://es.slideshare.net/michael.sahota/agile-culture-and-adoption-survival-guide
Ops is in top right – control
It’s about the bottom right – cultivation culture
You can’t hire in cultural change
- It’s very hard to change organisations
- Organisations reject change in general
Cultivation culture
- Have to build it into your organisation
- 20% time (3m postit notes, Google)
- Can’t just experiment – need to be able to act on it – Kodak
- Pairing, only by doing it for 6months, have to work with people – not alongside them, work with them
- Focus on experimentation/innovation
  - Okay to fail
- Role of mgmt
  - Create env in which it’s safe to learn
  - How well do we cultivate knowledge?
- Measure ppl on how good they are
  - Toyota Kata
  - Can only become manager when you’ve worked on factory floor for 10-15yrs
  - Focus on experimentation
  - Job of mgr = facilitate ppl doing the work to get better at doing it
- Not effective
  - Training
    - Bring the people with you –make it natural
  - Buying tools
  - DevOps team
    - Creating a silo to fix a silo
    - If you create an org that values learning & ppl enjoy being able to learn and innovte = maybe won’t have a hiring problem
- Taleb 3
  - Resilience is not the opposite of fragile
  - Antifragile is the opposite of fragile
  - People – e.g. Arnie, applying stress makes you stronger
  - Game days e.g. Jesse Robbins Amazon
    - Firedrills
    - Turn off the power and find out what actually happens
    - Google dirt exercises, earthquake simulation
    - Expose assumptions e.g. Mountain View failing over to Mountain View
  - Netflix
    - Simian army

Keynote: Puppet for Production in WebEx

http://puppetlabs.com/presentations/keynote-puppet-production-webex

Getting started with Puppet configuring an individual machine is straightforward. Managing a cluster of machines across multiple data centers, supporting upgrades while running a 7x24 service, and building for collaboration is significantly more challenging. The WebEx team will discuss the problems and some strategies they are using to manage this complexity

Reinhardt Quelle

Cloud Services Architect, Cisco/WebEx

Notes:

Sequencing: technical + business
Never run just one of anything
- DC, nodes
- Why cluster: Technical, business (commercial retail vs federal, privacy etc)
Version migration
- By DC
- By Node
Blueprints -> Orchestration
Unix toolchain philosophy
- Fabric/Salt/Ansible is the leatherman pocketknife multitool
- Puppet is the real knife
Masterless Puppet
- E.g. Google
- distribute modules/manifests to nodes
- Copy /etc/puppet/* to each node
- Complete resiliency per node
- No single point of control
- puppet apply –modulepath “...” --execute “include ...”
See also: Sam Bashton “Continuously Integrated Puppet in a Dynamic Environment”
Keep problems small
Push dependencies into Puppet instead of RPMs
- “4am-proofing” – Puppet = transparency
- Favour transparency over DRY

Keynote: Puppet at Scale – Case Study of PayPal's Learnings

http://puppetlabs.com/presentations/keynote-puppet-scale-%E2%80%93-case-study-paypals-learnings

Large scale and app level management pose challenges to any implementation of puppet. Come and learn some of the challenges PayPal Deployment Systems team faced and the how these were overcome.

Sr Dev Manager, PayPal

Notes:

Staging = mini paypal.com
Commits happening every few seconds
- How do you mange dependencies?
Web API for deployment
OpenStack
“Project Velocity”
Who looks after Puppet in the middle of the night?

- 3,000 developers?
Deployment system where puppet coding is not required to deploy new apps
Ninja engine
- Takes list of apps you want to install
- Assemble list of
- Discovery dependencies
- Generate puppet resources from dependency graph
- Then execute them
Caches dynamic resources for next run
What packages to install?
- Roles & Labels
System Hierarchy
- ENC -> Hiera
- Web tool to visualise Hiera data
- REST API for CRUD over Hiera data
Scaling ActiveMQ
- Problem with mcollective beyond dev/test where multiple puppetmasters/MQ
- Mcollective gave inconsistent results
- Were using MQ cluster through LB – connections would time out
- Removing LB fixed
- When ActiveMQ host dies have to reconfigure clients – Use Puppet
Mcollective at Scale
- Mcollective is equally useful as Puppet
- Paypal heavily depends on mcollective
- Replaces all SSH scriptsUse it to:
  - Query systems
  - Verify package versions
  - Kick off on-demand puppet runs
  - Ssh script replacement
- REST API enables Mcollective to web and other tools
  - Powerful = careful approval required
Worked with PuppetLabs to create “progress” module to work out how long a deploy will take

Keynote: VMware vCHS, Puppet, and Project Zombie

http://puppetlabs.com/presentations/keynote-vmware-vchs-puppet-and-project-zombie

Nicholas Weaver

Cloud Automation Architect, Hybrid Cloud Service, VMware

Nicholas Weaver is the Cloud Automation Architect for VMware's vCloud Hybrid Service (vCHS) platform and the primary architect behind the vCHS automation framework (Project Zombie). He is also a co-creator of the PuppetLabs Razor project and many VMware-specific free tools. He previously worked in the CTO office for EMC, in the EMC field as a vSpecialist, and as a infrastructure engineer in financial, media, and retail companies. Nick loves software-driven control, hacking prototypes together...

Notes:

Automation = Effort Evolution
Why is it important?
- Warehouse
Resiliency = we expect things to fail
- Can never assume anything is going to stay up
Project Zombie
Puppet has critical things for VMWare to choose
- Mcollective
- VM support
- Cassandra
- Netflix Astayanax
- JRuby – good middle ground for dev and ops
Rabbit MQ
Modules
Rez = globally distributed “refrigerator”
- Difficult to manage resources at global distributed scale
- Automation = baking
- Millions of resources
- Razor feed into Rez to manage state
- REST API
Engine = “the chef”
- Orchestration
- Wrote their own Operational-based language
- Controls flow and concurrency
- ZED (Zombie Engine DSL)
- Distributed and location-awareness
- B = Broker
- P = Processor

Continuously Integrated Puppet in a Dynamic Environment

http://puppetlabs.com/presentations/continuously-integrated-puppet-dynamic-environment

This talk will show how we deploy Puppet without a Puppetmaster on an autoscaling Amazon Web Services infrastructure. Key points of interest: - Masterless Puppet - Use of Jenkins for Puppet manifest testing and environment promotion (test->staging->production) - Puppet integration with Amazon CloudFormation

Sam Bashton

Director, Bashton Ltd

Matt Callanan's Blog

Tuesday, September 24, 2013

PuppetConf 2013 LiveStreaming Notes

Thursday 2013-08-23 [Friday BNE time]

Keynote: Why Did We Think Large Scale Distributed Systems Would be Easy?

Notes:

Keynote: Open Sourcing the Cloud

Notes:

How Do We Better Sell DevOps?

Nobody Has To Die Today: Keeping The Peace With The Other Meat Sacks

Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet

Notes:

So You've Got Scalability. Now What?

Notes:

Multi-Provider Vagrant: AWS, VMware, and More

Notes:

Building Data-Driven Infrastructure with Puppet

Notes:

Friday 2013-08-24 [Saturday BNE time]

Keynote: Stop Hiring Devops Experts (And Start Growing Them)

Notes:

Keynote: Puppet for Production in WebEx

Notes:

Keynote: Puppet at Scale – Case Study of PayPal's Learnings

Keynote: VMware vCHS, Puppet, and Project Zombie

Notes:

Continuously Integrated Puppet in a Dynamic Environment

Other Notes

Monday, September 16, 2013

DevOps Cafe: DevOps At SalesForce and Measuring Service