AIOps Maturity Model

Introduction

This document outlines an AIOps Maturity Model to help organizations assess and improve their Machine Learning Operations capabilities. It came from my own frustration that there weren’t any models that fit the real experience of end-to-end data science and operations relationships that covered both ‘conventional’ ML, and practically discussing LLM based systems and how completly differently you have to think about them.

This was originally published internally around May ‘24 and then presented at NIDC as an ‘Eye Test Model’, and I promised that I’d eventualy publish it; this is it, dusted off and tidied up for public consumption.

The model is structured across six key capability areas and five maturity levels, providing a roadmap for organizations to evolve their AIOps practices. This model is based on the reference materials listed below, and is far from original in any way, and probably alread out of date.

References

Capability Areas

People: Collaboration and communication among data scientists, data engineers, operations teams and software engineers.
Data Management and Exploration: Handling of data sources, data classification, mobility and discovery processes.
Generic Model Creation: Data gathering, compute management, and experiment & feedback tracking.
Generic Model Deployment and Release: Processes for deploying and releasing models.
Large Language Model Ops: Management and deployment of large language models, natural language evaluation, and prompt engineering.
Application Integration and Maintenance: Integration of models into applications and maintenance practices.

Maturity Model

People

Level	Data Scientists	Data Engineers	Software Engineers
Initial	Siloed, not in regular communications with the larger team	Non-Existent	Siloed, receive models remotely from the other team members
Minimal	Siloed, occationally participate in one-way ‘demonstrations’ to larger team	Siloed, not in regular communication with the larger team	Siloed, receive model remotely from the other team members
Procedural	Working directly with data engineers to convert experimentation code into repeatable scripts/jobs	Working with data scientists	Siloed, receive model remotely from the other team members, have visibility to model pipelines etc
Innovative	Working directly with data engineers to convert experimentation code into managable services/pipelines	Working with data scientists and software engineers to manage inputs/outputs	Working with data engineers to automate model integration into application code
Leading	Working directly with data engineers to convert experimentation code into managable services/pipelines. Working with software engineers to identify markers for data engineers	Working with data scientists and software engineers to manage inputs/outputs	Working with data engineers to automate model integration into application code. Implementing post-deployment metrics gathering

Data Management and Exploration

Level	Data Sources	Data Store	Data Sensitivity	Data Access	ETL Tasks
Initial	Disparate data sources with unaligned identifiers/taxonomies	No shared non-production data stores	Un/underspecified or presumptuious	Individual/Local based case-by-case and dataset-by-dataset	Largely script-driven
Minimal	Disparate data sources with unaligned identifiers/taxonomies	Shared experimental / dev unstructured data store (with minimal ACL)	Specified by convention but unenforced	Individual/Local based case-by-case and dataset-by-dataset or ’need to know’	Driven by version controlled transformations
Procedural	Disparate data sources with common external identifiers mappings / shared but unenforced taxonomies	Common downstream unstructured data store with basic dataset/user level ACL	Enforced at the dataset level, specified but unenforced at the attribute level, with informal rules around aggregate sensitivity	Group / RBAC via dataset-specific interfaces	Driven by version controlled & release managed transformations, with replica staging deployments for testing
Innovative	Aligned / Shared data sources with common entity identifiers and unified taxonomies	Common downstream unstructured and analytical datastores with transparent row/entity-level ACL	Enforced at the row/entity level, specified but unenforced at the attribute level, formal rules around aggregate sensitivity	Data Catalog driven discoverability, RBAC for access, ’need to know’ pathway established and auditable	Driven by version controlled & release managed transformations, with replica staging deployments for testing
Leading	Aligned / Shared data sources with common entity identifiers and unified taxonomies	Common downstream unstructured and analytical datastores with transparent attribute-level ACL	Enforced at the attribute level, specified but unenforced at the attribute level	Universal Schema Discovery with base RBAC access; automated and audited ’need to know’ requests; synthetic data for sensitive/confidential streams	Driven by CI/CD transformations, with replica staging deployments for testing

Generic Model Creation

Level	Data Gathering	Compute Management	Experiment Tracking	End Result
Initial	Manually	Likely not managed	Not predictably tracked	Single model file manually handed off with inputs/outputs
Minimal	Automatically by per-experiment data pipelines	Managed by team	Not predictably tracked	Training Code Version controlled; Single binary model file manually handed off with inputs/outputs
Procedural	Automatically by shared data pipelines/feature store	Managed as a shared ops capability	Tracked within teams	Both training code and resulting models are version controlled, possibly release managed
Innovative	Automatically by shared data catalog/feature store	Managed as a budgeted and tracked capability	Tracked within teams with shared experimental repositories	Both training code and resulting models are version controlled & release managed and security tested with A/B or Blue/Green deployments, evaluation feedback available to originating team at staging
Leading	Automatically by distributed data mesh	Managed as a cost centre with Data teams as ‘customers’	Tracked and published internally as derived data products	Retraining triggered automatically based on production metrics. Both training code and resulting models are version controlled. Multiple model versions deployed at once with continuous evaluation feedback available to team in production

Generic Model Deployment and Release

Level	Process	Scoring Script	Release Management
Initial	Manual	Might be manually created well after experiments, not version controlled	Handled by data scientist or data engineer alone
Minimal	Manual	Might be manually created well after experiments, likely version controlled	Handed off to software engineers
Procedural	Automatic	Version controlled with tests	Managed by Software engineering team
Innovative	Speculative	Triggered by anomaly & corrolation detection, Version controlled with tests	Managed by continuous delivery (CI/CD) pipeline
Leading	Generative	Triggered by non-statistical events, Version controlled with tests	Managed by continuous integration and CI/CD pipeline

Large Language Model Ops

Level	Discovery and Testing	Model/Inference Resources	Prompt Management	Deployment	Monitoring
Initial	Organic discovery of models and testing prompts				Basic Lab-driven Feedback Evaluation and Monitoring
Minimal		Shared model / inference resources	Iterative model augmentation with prompt engineering	Structured Deployment	Prompt-based feedback evaluations
Procedural		Centralized model/inference resources	Versioned prompt management with RAG / Tool Calling	Release-driven deployment	Structured deployment and inference based feedback driven evaluations
Innovative	Consistantly evaluating new models	Model serving/inference ‘as a service’ with resources under IaC	Comprehensive prompt management	Real-time deployment	Advanced monitoring and automated alerts
Leading		Seamless, collaborative environment for CI/CD	Fully automated monitoring and model/prompt refinement

Application Integration and Maintenance

Level	Expertise Reliance	Integration Tests	Release Process	Application Code Tests
Initial	Heavily reliant on data scientist expertise to implement		One-Off releases
Minimal	Heavily reliant on data scientist expertise to implement model	Basic integration tests exist for the model	Repeated Manual Releases	Unit tests
Procedural	Data scientist expertise required, but co-development with SMEs	Basic integration tests exist for the model	Automated	Unit tests
Innovative	Less reliant on data scientist expertise to implement model; SME’s empowered with ‘hands off’ model proposals	Unit and integration tests for each model release	Automated, in regular release/build pipelines	Unit/integration tests
Leading	SMEs proposing models that can go to production if passing ‘gates’ established by data science/ops	Unit and Integration tests for each model release	Continuous	Unit/integration tests

Introduction#

References#

Capability Areas#

Maturity Model#

People#

Data Management and Exploration#

Generic Model Creation#

Generic Model Deployment and Release#

Large Language Model Ops#

Application Integration and Maintenance#