Introduction

This document outlines an AIOps Maturity Model to help organizations assess and improve their Machine Learning Operations capabilities. It came from my own frustration that there weren’t any models that fit the real experience of end-to-end data science and operations relationships that covered both ‘conventional’ ML, and practically discussing LLM based systems and how completly differently you have to think about them.

This was originally published internally around May ‘24 and then presented at NIDC as an ‘Eye Test Model’, and I promised that I’d eventualy publish it; this is it, dusted off and tidied up for public consumption.

The model is structured across six key capability areas and five maturity levels, providing a roadmap for organizations to evolve their AIOps practices. This model is based on the reference materials listed below, and is far from original in any way, and probably alread out of date.

References

Capability Areas

  • People: Collaboration and communication among data scientists, data engineers, operations teams and software engineers.
  • Data Management and Exploration: Handling of data sources, data classification, mobility and discovery processes.
  • Generic Model Creation: Data gathering, compute management, and experiment & feedback tracking.
  • Generic Model Deployment and Release: Processes for deploying and releasing models.
  • Large Language Model Ops: Management and deployment of large language models, natural language evaluation, and prompt engineering.
  • Application Integration and Maintenance: Integration of models into applications and maintenance practices.

Maturity Model

People

LevelData ScientistsData EngineersSoftware Engineers
InitialSiloed, not in regular communications with the larger teamNon-ExistentSiloed, receive models remotely from the other team members
MinimalSiloed, occationally participate in one-way ‘demonstrations’ to larger teamSiloed, not in regular communication with the larger teamSiloed, receive model remotely from the other team members
ProceduralWorking directly with data engineers to convert experimentation code into repeatable scripts/jobsWorking with data scientistsSiloed, receive model remotely from the other team members, have visibility to model pipelines etc
InnovativeWorking directly with data engineers to convert experimentation code into managable services/pipelinesWorking with data scientists and software engineers to manage inputs/outputsWorking with data engineers to automate model integration into application code
LeadingWorking directly with data engineers to convert experimentation code into managable services/pipelines. Working with software engineers to identify markers for data engineersWorking with data scientists and software engineers to manage inputs/outputsWorking with data engineers to automate model integration into application code. Implementing post-deployment metrics gathering

Data Management and Exploration

LevelData SourcesData StoreData SensitivityData AccessETL Tasks
InitialDisparate data sources with unaligned identifiers/taxonomiesNo shared non-production data storesUn/underspecified or presumptuiousIndividual/Local based case-by-case and dataset-by-datasetLargely script-driven
MinimalDisparate data sources with unaligned identifiers/taxonomiesShared experimental / dev unstructured data store (with minimal ACL)Specified by convention but unenforcedIndividual/Local based case-by-case and dataset-by-dataset or ’need to know’Driven by version controlled transformations
ProceduralDisparate data sources with common external identifiers mappings / shared but unenforced taxonomiesCommon downstream unstructured data store with basic dataset/user level ACLEnforced at the dataset level, specified but unenforced at the attribute level, with informal rules around aggregate sensitivityGroup / RBAC via dataset-specific interfacesDriven by version controlled & release managed transformations, with replica staging deployments for testing
InnovativeAligned / Shared data sources with common entity identifiers and unified taxonomiesCommon downstream unstructured and analytical datastores with transparent row/entity-level ACLEnforced at the row/entity level, specified but unenforced at the attribute level, formal rules around aggregate sensitivityData Catalog driven discoverability, RBAC for access, ’need to know’ pathway established and auditableDriven by version controlled & release managed transformations, with replica staging deployments for testing
LeadingAligned / Shared data sources with common entity identifiers and unified taxonomiesCommon downstream unstructured and analytical datastores with transparent attribute-level ACLEnforced at the attribute level, specified but unenforced at the attribute levelUniversal Schema Discovery with base RBAC access; automated and audited ’need to know’ requests; synthetic data for sensitive/confidential streamsDriven by CI/CD transformations, with replica staging deployments for testing

Generic Model Creation

LevelData GatheringCompute ManagementExperiment TrackingEnd Result
InitialManuallyLikely not managedNot predictably trackedSingle model file manually handed off with inputs/outputs
MinimalAutomatically by per-experiment data pipelinesManaged by teamNot predictably trackedTraining Code Version controlled; Single binary model file manually handed off with inputs/outputs
ProceduralAutomatically by shared data pipelines/feature storeManaged as a shared ops capabilityTracked within teamsBoth training code and resulting models are version controlled, possibly release managed
InnovativeAutomatically by shared data catalog/feature storeManaged as a budgeted and tracked capabilityTracked within teams with shared experimental repositoriesBoth training code and resulting models are version controlled & release managed and security tested with A/B or Blue/Green deployments, evaluation feedback available to originating team at staging
LeadingAutomatically by distributed data meshManaged as a cost centre with Data teams as ‘customers’Tracked and published internally as derived data productsRetraining triggered automatically based on production metrics. Both training code and resulting models are version controlled. Multiple model versions deployed at once with continuous evaluation feedback available to team in production

Generic Model Deployment and Release

LevelProcessScoring ScriptRelease Management
InitialManualMight be manually created well after experiments, not version controlledHandled by data scientist or data engineer alone
MinimalManualMight be manually created well after experiments, likely version controlledHanded off to software engineers
ProceduralAutomaticVersion controlled with testsManaged by Software engineering team
InnovativeSpeculativeTriggered by anomaly & corrolation detection, Version controlled with testsManaged by continuous delivery (CI/CD) pipeline
LeadingGenerativeTriggered by non-statistical events, Version controlled with testsManaged by continuous integration and CI/CD pipeline

Large Language Model Ops

LevelDiscovery and TestingModel/Inference ResourcesPrompt ManagementDeploymentMonitoring
InitialOrganic discovery of models and testing promptsBasic Lab-driven Feedback Evaluation and Monitoring
MinimalShared model / inference resourcesIterative model augmentation with prompt engineeringStructured DeploymentPrompt-based feedback evaluations
ProceduralCentralized model/inference resourcesVersioned prompt management with RAG / Tool CallingRelease-driven deploymentStructured deployment and inference based feedback driven evaluations
InnovativeConsistantly evaluating new modelsModel serving/inference ‘as a service’ with resources under IaCComprehensive prompt managementReal-time deploymentAdvanced monitoring and automated alerts
LeadingSeamless, collaborative environment for CI/CDFully automated monitoring and model/prompt refinement

Application Integration and Maintenance

LevelExpertise RelianceIntegration TestsRelease ProcessApplication Code Tests
InitialHeavily reliant on data scientist expertise to implementOne-Off releases
MinimalHeavily reliant on data scientist expertise to implement modelBasic integration tests exist for the modelRepeated Manual ReleasesUnit tests
ProceduralData scientist expertise required, but co-development with SMEsBasic integration tests exist for the modelAutomatedUnit tests
InnovativeLess reliant on data scientist expertise to implement model; SME’s empowered with ‘hands off’ model proposalsUnit and integration tests for each model releaseAutomated, in regular release/build pipelinesUnit/integration tests
LeadingSMEs proposing models that can go to production if passing ‘gates’ established by data science/opsUnit and Integration tests for each model releaseContinuousUnit/integration tests