MLOps Pipelines with Kubeflow On-Prem: A Practical Guide

Why On-Prem MLOps?

“Just use SageMaker” doesn’t work when:

Patient data can’t leave the hospital network (HIPAA/GDPR)
Defense workloads require air-gapped environments
GPU cloud costs at scale exceed hardware ownership
Your 50TB training dataset costs $5K/month just for cloud storage egress

Kubeflow on Kubernetes gives you SageMaker-class ML infrastructure on hardware you control.

Architecture

On-Prem Kubernetes Cluster
├── Kubeflow Central Dashboard
├── Kubeflow Pipelines (orchestration)
├── KServe (model serving)
├── Katib (hyperparameter tuning)
├── Training Operator (distributed training)
├── MinIO (artifact storage)
├── MySQL (metadata store)
└── GPU Nodes (NVIDIA A100/H100)

Installation with Kustomize

# Clone Kubeflow manifests
git clone https://github.com/kubeflow/manifests.git
cd manifests

# Install everything
while ! kustomize build example | kubectl apply -f -; do
  echo "Retrying..."
  sleep 10
done

For production, customize the installation:

# kustomization.yaml — production overrides
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - github.com/kubeflow/manifests//common/cert-manager
  - github.com/kubeflow/manifests//common/istio
  - github.com/kubeflow/manifests//apps/pipeline
  - github.com/kubeflow/manifests//apps/kserve
  - github.com/kubeflow/manifests//apps/training-operator
  - github.com/kubeflow/manifests//apps/katib

patches:
  - path: patches/minio-pvc.yaml     # Use real storage, not emptyDir
  - path: patches/mysql-ha.yaml      # HA MySQL for metadata
  - path: patches/gpu-nodepool.yaml   # GPU scheduling config

Building an ML Pipeline

from kfp import dsl, compiler

@dsl.component(base_image="python:3.12")
def preprocess_data(input_path: str, output_path: dsl.OutputPath()):
    import pandas as pd
    df = pd.read_parquet(input_path)
    df = df.dropna().reset_index(drop=True)
    # Feature engineering...
    df.to_parquet(output_path)

@dsl.component(base_image="pytorch/pytorch:2.5-cuda12.4")
def train_model(data_path: str, model_path: dsl.OutputPath(), epochs: int = 10):
    import torch
    # Training loop...
    torch.save(model.state_dict(), model_path)

@dsl.component(base_image="python:3.12")
def evaluate_model(model_path: str, test_data: str) -> float:
    # Evaluation...
    return accuracy

@dsl.component
def deploy_model(model_path: str, accuracy: float):
    if accuracy > 0.95:
        # Deploy to KServe
        pass

@dsl.pipeline(name="training-pipeline")
def ml_pipeline(data_path: str):
    preprocess = preprocess_data(input_path=data_path)
    train = train_model(data_path=preprocess.output, epochs=20)
    evaluate = evaluate_model(
        model_path=train.output,
        test_data=data_path
    )
    deploy_model(
        model_path=train.output,
        accuracy=evaluate.output
    )

compiler.Compiler().compile(ml_pipeline, "pipeline.yaml")

GPU Management

# NVIDIA GPU Operator for Kubernetes
apiVersion: v1
kind: Pod
metadata:
  name: gpu-training
spec:
  containers:
    - name: trainer
      image: registry.internal/ml-trainer:v2
      resources:
        limits:
          nvidia.com/gpu: 4  # Request 4 GPUs
      env:
        - name: CUDA_VISIBLE_DEVICES
          value: "0,1,2,3"
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB

For multi-GPU distributed training:

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: distributed-training
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      template:
        spec:
          containers:
            - name: trainer
              image: registry.internal/trainer:v2
              resources:
                limits:
                  nvidia.com/gpu: 4
    Worker:
      replicas: 3
      template:
        spec:
          containers:
            - name: trainer
              image: registry.internal/trainer:v2
              resources:
                limits:
                  nvidia.com/gpu: 4

Infrastructure Automation

I deploy the entire Kubeflow stack with Ansible:

- name: Deploy Kubeflow on-prem
  hosts: k8s_ml_cluster
  roles:
    - role: nvidia-gpu-operator
    - role: minio-ha
    - role: mysql-ha
    - role: kubeflow
      vars:
        kubeflow_version: "1.9"
        storage_class: ceph-block
        gpu_scheduling: exclusive

Kubernetes cluster provisioning at Kubernetes Recipes. Ansible automation at Ansible Pilot. Infrastructure provisioning with Terraform at Terraform Pilot.

On-Prem vs Cloud Cost (3-Year TCO)

Workload: 8 GPUs, continuous training + serving

Cloud (AWS p4d.24xlarge):
  On-demand: $32.77/hr × 8,760h × 3yr = $861,000
  Reserved: $19.22/hr × 8,760h × 3yr  = $505,000

On-Prem (8× A100 server):
  Hardware: $250,000
  Colocation: $2,000/mo × 36mo = $72,000
  Power: $1,500/mo × 36mo = $54,000
  Ops (0.5 FTE): $75,000/yr × 3 = $225,000
  Total: $601,000

On-prem wins at sustained high utilization. Cloud wins for bursty workloads. Hybrid (on-prem base + cloud burst) is often optimal.

The Regulated Industry Advantage

For healthcare, defense, and financial services, on-prem MLOps isn’t just about cost — it’s about compliance. Data never leaves your network, model provenance is fully auditable, and you control the entire stack. Kubeflow makes this enterprise-grade without building everything from scratch.

MLOps Pipelines with Kubeflow On-Prem: A Practical Guide

Why On-Prem MLOps?

Architecture

Installation with Kustomize

Building an ML Pipeline

GPU Management

Infrastructure Automation

On-Prem vs Cloud Cost (3-Year TCO)

The Regulated Industry Advantage

Related Articles

Differential Privacy: How Math Protects Your Privacy

GLM-5.2 744B: Sparse Attention Meets Efficient MoE

Reliable AI Agents in Java with LangChain4J — Workshop

AI Gateway on Kubernetes: Route and Load-Balance LLM Traffic