Skip to main content
🎀 Speaking at KubeCon EU 2026 Lessons Learned Orchestrating Multi-Tenant GPUs on OpenShift AI View Session
🎀 Speaking at Red Hat Summit 2026 GPUs take flight: Safety-first multi-tenant Platform Engineering with NVIDIA and OpenShift AI Learn More
Automation

MLOps Pipeline Automation with Ansible and Kubeflow

Luca Berton β€’ β€’ 1 min read
#mlops#ansible#kubeflow#automation#ai

\n## πŸ”„ MLOps: Where Ansible Meets Kubeflow

MLOps pipelines need two things: reproducible ML workflows (Kubeflow) and reproducible infrastructure (Ansible). Together, they automate the entire lifecycle from data preparation to model serving.

Architecture

Ansible (Infrastructure Layer)
  β”œβ”€β”€ Provision GPU nodes
  β”œβ”€β”€ Install Kubeflow
  β”œβ”€β”€ Configure storage (S3/Ceph)
  └── Set up monitoring

Kubeflow (ML Layer)
  β”œβ”€β”€ Data preparation pipeline
  β”œβ”€β”€ Model training
  β”œβ”€β”€ Evaluation & validation
  └── Model deployment (KServe)

Ansible: The Infrastructure Layer

Install Kubeflow

---
- name: Deploy Kubeflow on Kubernetes
  hosts: k8s_control_plane
  tasks:
    - name: Add Kubeflow manifests
      ansible.builtin.git:
        repo: https://github.com/kubeflow/manifests
        dest: /opt/kubeflow-manifests
        version: v1.9.0

    - name: Install Kubeflow with kustomize
      kubernetes.core.k8s:
        state: present
        src: "{{ item }}"
      loop:
        - /opt/kubeflow-manifests/common/cert-manager/
        - /opt/kubeflow-manifests/common/istio/
        - /opt/kubeflow-manifests/apps/pipeline/

    - name: Configure GPU node pool
      kubernetes.core.k8s:
        state: present
        definition:
          apiVersion: v1
          kind: Node
          metadata:
            labels:
              accelerator: nvidia-a100
            name: "{{ item }}"
      loop: "{{ gpu_nodes }}"

    - name: Install NVIDIA device plugin
      kubernetes.core.helm:
        name: nvidia-device-plugin
        chart_ref: nvidia/k8s-device-plugin
        release_namespace: kube-system

Kubeflow: The ML Layer

Training Pipeline

from kfp import dsl

@dsl.component(base_image="python:3.11")
def prepare_data(dataset_path: str, output_path: dsl.Output[dsl.Dataset]):
    import pandas as pd
    df = pd.read_parquet(dataset_path)
    df_clean = df.dropna().drop_duplicates()
    df_clean.to_parquet(output_path.path)

@dsl.component(base_image="pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime")
def train_model(dataset: dsl.Input[dsl.Dataset], model: dsl.Output[dsl.Model]):
    import torch
    # Training logic here
    torch.save(trained_model.state_dict(), model.path)

@dsl.component
def evaluate_model(model: dsl.Input[dsl.Model]) -> float:
    # Evaluation logic
    return accuracy

@dsl.pipeline(name="training-pipeline")
def training_pipeline(dataset_path: str):
    data = prepare_data(dataset_path=dataset_path)
    model = train_model(dataset=data.outputs["output_path"])
    evaluation = evaluate_model(model=model.outputs["model"])

Automated Retraining with Ansible + Cron

---
- name: Trigger model retraining
  hosts: localhost
  tasks:
    - name: Check model drift
      uri:
        url: "http://monitoring.internal/api/v1/query"
        body: '{"query": "model_accuracy_score < 0.85"}'
        method: POST
      register: drift_check

    - name: Trigger Kubeflow pipeline
      uri:
        url: "http://kubeflow.internal/pipeline/apis/v2beta1/runs"
        method: POST
        body_format: json
        body:
          display_name: "Automated retrain - {{ ansible_date_time.iso8601 }}"
          pipeline_version_reference:
            pipeline_id: "training-pipeline"
      when: drift_check.json.data.result | length > 0

Key Practices

  1. Version everything β€” data, code, models, and infrastructure
  2. Ansible for infra, Kubeflow for ML β€” don’t mix concerns
  3. Automate retraining β€” trigger on drift, not on schedule
  4. Test in staging β€” full pipeline dry runs before production
  5. Track lineage β€” every model should trace back to its data and code

Building MLOps pipelines? I help teams automate the full ML lifecycle with Ansible and Kubeflow. Let’s connect.\n

Share:

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens TechMeOut