Machine Learning To Optimise DevOps Practices - Peter Varhol

Posted on Oct 23, 2017

DevOpsDays Edinburgh Conference DevOps Machine Learning

Write up of Peter Varhol’s talk about using machine learning to optimise devops practices from Devopsdays Edinburgh 2017.

Building learning into monitoring and feedback

Introduction - What is machine learning?

Machine learning is layered algorithms that change parameters based on feedback from known data, either linear or non-linear. Algorithms can be:

Fixed - algorithms that do not change once deployed
Adaptive - algorithms that continually adjust to new data

Machine learning is usually used as part of a larger system.

Why fixed systems?

Fixed systems are ideal for when the problem domain is static and the expectations remain constant. Examples include:

Transportation - self driving cars, aeroplanes, drones
Medical - diagnosis systems

In the above examples the right answer is known under most conditions and therefore the algorithms will remain valid over time.

Why adaptive systems?

Adaptive systems are ideal for when the problem domain is non static and the right answer varies over time. They can also be used to optimise a particular result. Examples of adaptive systems include:

Airline pricing
- Ticket prices change multiple times per day based on demand
- Airlines are trying to incentivise you to fill their plane but they want you to do it at the highest possible price
Ecommerce systems
- Recommendation engines are designed to offer you additional products that will increase your overall spend
- As you buy more the right answer about what to show you changes as the system will have more information about what you might or might not purchase

In the above examples the right answer changes over time and with various other factors therefore having an algorithm that can adapt to suit is useful in these circumstances.

How does this apply to devops?

Devops practices generate data:

during development - agile metrics, Jira issues, test metrics etc
during CI - system test metrics
during CD - quality metrics
post deployment - availability, performance, usage logs etc

The problem with this data is that there is too much, its too detailed and too spread out. You need to set thresholds and analyse as a whole in order to get results.

Focus on monitoring

In order to do this you should be gathering ongoing data around:

real user monitoring
application performance
synthetic tests

Once you have this data you can apply fixed or adaptive algorithms based on need which will enable you to tackle the backend of devops:

identifying unhealthy trends
diagnose failures/poor performance
recommend action

Predictive analysis

Big data makes predictions possible. By looking at past events, what went wrong and how we fixed them we can learn to predict what might happen in the future. We can rely on past data and we can use a fixed algorithm to predict events, but clear goals are needed.

Techniques

There are a couple of techniques that can be applied to machine learning:

neural networks
genetic algorithms

These techniques are both useful in different situations and therefore you should know what you are looking to get out before making use of either one.

Neural networks

Neural networks are layered algorithms whose variables can be adjusted via a learning process via training with known inputs and outputs.

Genetic Algorithms

Genetic algorithms are based on the principle of natural selection. There are a range of solutions and you try each of them and work to try and combine the best alternatives to produce the output at which point you can start all over again.

Devops data can be particularly suited to training neural networks. By feeding in data about application performance, failures, traffic levels etc you can build an algorithm that can make decisions about the actions that should be taken.

Decisions are complex

Decisions can only ever be as good as the data supplied. Give a machine the same data as an expert and it can be as effective in making decisions. The machine can also learn over time to help with predictability. However the machine should not be considered a replacement for the expert, instead it should be seen as an assistant.

Intelligent systems can be wrong. Sometimes the problem domain is ambiguous or there is no single right answer. This is where the expert will be able to apply knowledge and experience in making the right decisions because a machine sees all events as equal but an expert can make distinguishments between events. Ultimately you can’t automate what you don’t understand.