What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

MLOps Pipelines with Kubeflow On-Prem: A Practical Guide

Luca Berton • Thu Feb 26 2026 • 1 min read •

#mlops#kubeflow#kubernetes#on-premises#machine-learning#pipelines

Why On-Prem MLOps?

“Just use SageMaker” doesn’t work when:

Patient data can’t leave the hospital network (HIPAA/GDPR)
Defense workloads require air-gapped environments
GPU cloud costs at scale exceed hardware ownership
Your 50TB training dataset costs $5K/month just for cloud storage egress

Kubeflow on Kubernetes gives you SageMaker-class ML infrastructure on hardware you control.

Architecture

On-Prem Kubernetes Cluster
├── Kubeflow Central Dashboard
├── Kubeflow Pipelines (orchestration)
├── KServe (model serving)
├── Katib (hyperparameter tuning)
├── Training Operator (distributed training)
├── MinIO (artifact storage)
├── MySQL (metadata store)
└── GPU Nodes (NVIDIA A100/H100)

Installation with Kustomize

# Clone Kubeflow manifests
git clone https://github.com/kubeflow/manifests.git
cd manifests

# Install everything
while ! kustomize build example | kubectl apply -f -; do
  echo "Retrying..."
  sleep 10
done

For production, customize the installation:

# kustomization.yaml — production overrides
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - github.com/kubeflow/manifests//common/cert-manager
  - github.com/kubeflow/manifests//common/istio
  - github.com/kubeflow/manifests//apps/pipeline
  - github.com/kubeflow/manifests//apps/kserve
  - github.com/kubeflow/manifests//apps/training-operator
  - github.com/kubeflow/manifests//apps/katib

patches:
  - path: patches/minio-pvc.yaml     # Use real storage, not emptyDir
  - path: patches/mysql-ha.yaml      # HA MySQL for metadata
  - path: patches/gpu-nodepool.yaml   # GPU scheduling config

Building an ML Pipeline

from kfp import dsl, compiler

@dsl.component(base_image="python:3.12")
def preprocess_data(input_path: str, output_path: dsl.OutputPath()):
    import pandas as pd
    df = pd.read_parquet(input_path)
    df = df.dropna().reset_index(drop=True)
    # Feature engineering...
    df.to_parquet(output_path)

@dsl.component(base_image="pytorch/pytorch:2.5-cuda12.4")
def train_model(data_path: str, model_path: dsl.OutputPath(), epochs: int = 10):
    import torch
    # Training loop...
    torch.save(model.state_dict(), model_path)

@dsl.component(base_image="python:3.12")
def evaluate_model(model_path: str, test_data: str) -> float:
    # Evaluation...
    return accuracy

@dsl.component
def deploy_model(model_path: str, accuracy: float):
    if accuracy > 0.95:
        # Deploy to KServe
        pass

@dsl.pipeline(name="training-pipeline")
def ml_pipeline(data_path: str):
    preprocess = preprocess_data(input_path=data_path)
    train = train_model(data_path=preprocess.output, epochs=20)
    evaluate = evaluate_model(
        model_path=train.output,
        test_data=data_path
    )
    deploy_model(
        model_path=train.output,
        accuracy=evaluate.output
    )

compiler.Compiler().compile(ml_pipeline, "pipeline.yaml")

GPU Management

# NVIDIA GPU Operator for Kubernetes
apiVersion: v1
kind: Pod
metadata:
  name: gpu-training
spec:
  containers:
    - name: trainer
      image: registry.internal/ml-trainer:v2
      resources:
        limits:
          nvidia.com/gpu: 4  # Request 4 GPUs
      env:
        - name: CUDA_VISIBLE_DEVICES
          value: "0,1,2,3"
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB

For multi-GPU distributed training:

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: distributed-training
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      template:
        spec:
          containers:
            - name: trainer
              image: registry.internal/trainer:v2
              resources:
                limits:
                  nvidia.com/gpu: 4
    Worker:
      replicas: 3
      template:
        spec:
          containers:
            - name: trainer
              image: registry.internal/trainer:v2
              resources:
                limits:
                  nvidia.com/gpu: 4

Infrastructure Automation

I deploy the entire Kubeflow stack with Ansible:

- name: Deploy Kubeflow on-prem
  hosts: k8s_ml_cluster
  roles:
    - role: nvidia-gpu-operator
    - role: minio-ha
    - role: mysql-ha
    - role: kubeflow
      vars:
        kubeflow_version: "1.9"
        storage_class: ceph-block
        gpu_scheduling: exclusive

Kubernetes cluster provisioning at Kubernetes Recipes. Ansible automation at Ansible Pilot. Infrastructure provisioning with Terraform at Terraform Pilot.

On-Prem vs Cloud Cost (3-Year TCO)

Workload: 8 GPUs, continuous training + serving

Cloud (AWS p4d.24xlarge):
  On-demand: $32.77/hr × 8,760h × 3yr = $861,000
  Reserved: $19.22/hr × 8,760h × 3yr  = $505,000

On-Prem (8× A100 server):
  Hardware: $250,000
  Colocation: $2,000/mo × 36mo = $72,000
  Power: $1,500/mo × 36mo = $54,000
  Ops (0.5 FTE): $75,000/yr × 3 = $225,000
  Total: $601,000

On-prem wins at sustained high utilization. Cloud wins for bursty workloads. Hybrid (on-prem base + cloud burst) is often optimal.

The Regulated Industry Advantage

For healthcare, defense, and financial services, on-prem MLOps isn’t just about cost — it’s about compliance. Data never leaves your network, model provenance is fully auditable, and you control the entire stack. Kubeflow makes this enterprise-grade without building everything from scratch.

📌 Need expert help with this topic?

🧠

AI Integration & GPU Platforms

Need help deploying AI/ML platforms? Get expert consulting on OpenShift AI, GPU orchestration, and MLOps.

☸️

Kubernetes & Containerization

Master Kubernetes and container orchestration with hands-on workshops and architecture consulting.

Book a free consultation →

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

LinkedIn Bluesky YouTube Contact →

← Back to Blog

JSON vs TOON for AI Input: Token-Efficient Data for LLMs

Compare JSON and TOON (Token-Oriented Object Notation) for feeding structured data to Large Language Models. See how TOON cuts token counts by up to 50 percent while keeping JSON compatibility.

Tue Mar 03 2026

Building Custom AI Skills with InstructLab Taxonomy

Create domain-specific AI capabilities using InstructLab's taxonomy system—from writing skill definitions to generating synthetic training data and validating fine-tuned models.

Mon Mar 02 2026

Accessing the OpenClaw Control UI Dashboard on Azure

How to access the OpenClaw Control UI dashboard from an Azure VM — via SSH tunnel (secure) or public IP. Covers device pairing, dashboard authentication, and the browser-based management interface.

Thu Feb 26 2026