Actions

MLOps

Revision as of 20:14, 4 January 2021 by User (talk | contribs)

MLOps (Machine Learning Operations) is a set of practices that provide determinism, scalability, agility, and governance in the model development and deployment pipeline. This new paradigm focuses on four key areas within model training, tuning, and deployment (inference): machine learning must be reproducible, it must be collaborative, it must be scalable, and it must be continuous.[1]

MLOps is modeled on the existing discipline of DevOps, the modern practice of efficiently writing, deploying and running enterprise applications. DevOps got its start a decade ago as a way warring tribes of software developers (the Devs) and IT operations teams (the Ops) could collaborate. MLOps adds to the team the data scientists, who curate datasets and build AI models that analyze them. It also includes ML engineers, who run those datasets through the models in disciplined, automated ways. It’s a big challenge in raw performance as well as management rigor. Datasets are massive and growing, and they can change in real time. AI models require careful tracking through cycles of experiments, tuning and retraining. So, MLOps needs a powerful AI infrastructure that can scale as companies grow.[2]


MLOps combine machine learning, applications development and IT operations
MLOps
source: Neal Analytics


The Guiding Principles of MLOps[3]

When approached from the perspective of a model – or even ML code, machine learning cannot be developed truly collaboratively as most of what makes the model is hidden. MLOps encourages teams to make everything that goes into producing a machine learning model visible – from data extraction to model deployment and monitoring. Turning tacit knowledge into code makes machine learning collaborative.

  • Reproducible: Machine learning should be reproducible.

Data scientists should be able to audit and reproduce every production model. In software development, version control for code is standard, but machine learning requires more than that. Most importantly, it means versioning data as well as parameters and metadata. Storing all model training related artifacts ensures that models can always be reproduced.

  • Continuous: Machine learning should be continuous.

A machine learning model is temporary. The lifecycle of a trained model depends entirely on the use-case and how dynamic the underlying data is. Building a fully automatic, self-healing system may have diminishing returns based on your use-case, but machine learning should be thought of as a continuous process and as such, retraining a model should be as close to effortless as possible.

  • Tested: Machine learning should be tested & monitored.

Testing and monitoring are part of engineering practices, and machine learning should be no different. In the machine learning context, the meaning of performance is not only focused on technical performance (such as latency) but, more importantly, predictive performance. MLOps best practices encourage making expected behavior visible and to set standards that models should adhere to, rather than rely on a gut feeling.

  1. Definition - What does MLOps Mean? Paperspace Blog
  2. What is MLOps? Nvidia blog
  3. The Guiding Principles of MLOps Valohai