The Perfect DevOps Storm - Paul Gillespie

Posted on Oct 28, 2017

DevOpsDays Edinburgh Conference DevOps Organisational Change Microservices Teamwork

Write up of Paul Gillespie’s talk about the DevOps storm that ensued in Skyscanner from Devopsdays Edinburgh 2017.

Introduction

First some terminology:

squad is equivalent to a team
tribe is groups of squads with similar goals
enablement provides tools and services that allow squads and tribes to succeed

What is the perfect devops storm?

The perfect devops storm is what ensues when you attempt to converge a large number of changes impacting the organisational structure, the delivery and operational processes, and the technology in a short space of time.

Why make changes?

To increase the speed of execution
To go from idea to user in an afternoon
To stay ahead of the market In order to support these changes though you need the right processes and the right technology.

Changes at Skyscanner

Over the past few years there have been 12 major changes to the organisational structure, the processes and the technology in use. This does not include product changes, changes in market i.e shift to mobile and other normal everyday evolutions.

2014

Squadification - This is the biggest change that Skyscanner has introduced. It involved a complete restructuring of the business.
Continuous delivery
Move to open source languages and tools - This was a big transition away from .net which wasn’t going to allow the ease and pace of development that was becoming necessary.
Log all the things - Skyscanner built a data platform to enable them to log absolutely everything and to analyse that data to provide meaningful feedback to the squads.

2015

Move from datacentres to AWS - Skyscanner saw that running their own infrastructure wasn’t going to enable their squads to move quickly and therefore took a massive gamble on moving their full operation over to AWS.
Experiment backed software development - Skyscanner is concerned with conversion and getting visitors to buy flights etc through them therefore they are constantly evaluating what works and what doesn’t work. In order to support this they built their own experimentation platform that would better support A/B tests and enable them to run hundreds of experiments a month on their users.

2016

Age of enablement - If you want your developers to move quickly they need tools and services to support them to allow them to focus on what is necessary. Skyscanner created enablement squads and tribes to do exactly that.
Microservices - Moving all applications over to microservices allowed for the squads to be smaller and more focused
Rapid deployment - A benefit of the smaller squads and more focused developers was an increase in the number of deployments, at roughly 10k a day currently.
You build it you run it - This move means that at all times someone who is knowledgeable about a particular service is available to support it, either during working hours on via an on call system. More reliable services tend to be created as people don’t want to be woken up.

2017

Building distributed cloud native services needs a better environment - The cloud has enabled Skyscanner to move at an immense pace and keep up with customer demand but it poses its own problems. Infrastructure can and does disappear, network’s suffer strange behaviour sometimes and the software is constantly changing. If you are going to do this you need to look at building an environment that you can use to insulate your software from a lot of the cloud weirdness.
Principles and Standards - Focus on defining the ways that your squads should work and how that applies in reality. You still want to give them autonomy but you want it to be in a controlled manner that achieves the best results.

Lessons Learnt

Your organisational structure is an artificial construct. It needs to evolve and adapt over time. It should not be static. As business needs change, change your organisational structure. Inflexibility kills.
With great power comes great responsibility. Autonomy without accountability is just a vacation. Allow your squads to have freedom of technology, vision and direction but make sure they understand the responsibility will lie with them to maintain and support this.
Autonomy != Ability. Just because you give a team autonomy doesn’t mean they know what to do or can do what you expect of them. You need to set boundaries, have basic principles to guide people and most of all you need to be able to train people to work autonomously.
Dedicated enablement squads/tribes for the win. Creating squads and tribes that are focused on making sure others are able to do their job easily and effectively are essential. Think developer enablement and employee enablement.
Change fatigue is a very real problem. It is very easy to inflict a large amount of unintended technical debt on a large number of squads by changing too regularly. If you are going to change technologies think about the full cost of that change and not just the benefits of making it.
Journeys not destinations. Continuous improvement should be a core competency for everyone. Think about where you are going and how to get everyone there.
If you build it they will come…..maybe. Try to avoid thinking you know best just because you work on a project all day long. Your users will often suprise you and prove you wrong.
You build it, you run it and you definitely own it. This doesn’t mean you need to manage your full stack. It’s ok and at times helpful to rely on the services of others but remember that you don’t run that or own it therefore build resiliency or error handling around it to allow your service to continue on without it.
Engineering standards are an enabler. Having basic standards that people should be following will help to eliminate duplication and make your squads more effective with their time.
Better to be restrictive at first with standards. It is far easier to give than it is to take away so start off restrictive and over time you can adjust to find the equilibrium.
Measure what you preach. Track adoption of enablement processes, services and standards. Knowing how well something is going down with your squads will help you develop that offering or find pain points that you can work to resolve.
Internal open sourcing scales delivery. There will always be situations where features require changes outwith the original squads and that is ok but make it easy for this to happen.
Codify your standards. The best standards are the ones that you get for free.
Microservices are not a way to pad your CV or get a book deal. They require a lot of work and a lot of thought. Don’t make the decision just for the sake of it.
Not all the technologies. Sometimes less is more. Use the right tool for the right job but don’t try to fit the latest shiny in to a project to then discard it when the next shiny comes along in a few months.

Summary

Continuous improvement is a journey not a destination. Be responsible with your autonomy and think truly about what would be best for the users, don’t be afraid to ask them or try things out on them. Production is as much an experimentation platform as development. Think about your standards and what you would like to see adopted. Make it easy for everyone to contribute to these standards and don’t be autocratic. Invest in enablement as early as possible. You will not regret investment in making squads more effective. Change is the only constant so learn to live with it. Don’t be rigid, if something doesn’t work change it. Adapt and evolve.