Rust Kubernetes Operators: kube-rs Controllers (2026)

Go has been the default language for Kubernetes operators since the ecosystem began. But Rust operators are gaining traction in 2026 — not for ideology, but for measurable production benefits: 10x less memory, no GC pauses during reconciliation, and compile-time guarantees that prevent entire categories of runtime panics.

Why Rust for Operators?

A typical Go operator managing 10,000 custom resources consumes 200-500MB of RAM. The equivalent Rust operator: 20-50MB. When you’re running operators on every cluster in a fleet, that difference is the cost of an extra node.

Beyond memory:

No garbage collection pauses during reconciliation — predictable latency
Compile-time validation of API structures — no runtime JSON unmarshalling surprises
Single binary deployment — no runtime dependencies, minimal container image
Thread safety guarantees — the borrow checker prevents data races in concurrent reconcilers

The kube-rs Ecosystem

The Rust Kubernetes ecosystem centers on kube-rs:

use kube::{Api, Client, CustomResource};
use kube::runtime::controller::{Action, Controller};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::sync::Arc;

// Define your Custom Resource
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(group = "ai.example.com", version = "v1", kind = "InferenceService")]
#[kube(namespaced, status = "InferenceServiceStatus")]
pub struct InferenceServiceSpec {
    pub model: String,
    pub replicas: i32,
    pub gpu_type: String,
    pub max_batch_size: Option<u32>,
}

#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
pub struct InferenceServiceStatus {
    pub ready_replicas: i32,
    pub endpoint: Option<String>,
    pub phase: String,
}

The Reconciliation Loop

use kube::runtime::controller::Action;
use std::time::Duration;

async fn reconcile(
    svc: Arc<InferenceService>,
    ctx: Arc<Context>,
) -> Result<Action, Error> {
    let client = &ctx.client;
    let ns = svc.namespace().unwrap_or_default();
    let name = svc.name_any();

    // Ensure deployment exists
    let deployments: Api<Deployment> = Api::namespaced(client.clone(), &ns);
    match deployments.get_opt(&name).await? {
        Some(existing) => {
            // Update if spec changed
            if needs_update(&existing, &svc.spec) {
                let patch = build_deployment_patch(&svc);
                deployments.patch(&name, &PatchParams::apply("inference-operator"), &patch).await?;
            }
        }
        None => {
            // Create deployment
            let deployment = build_deployment(&svc);
            deployments.create(&PostParams::default(), &deployment).await?;
        }
    }

    // Update status
    update_status(client, &svc).await?;

    // Requeue after 30 seconds
    Ok(Action::requeue(Duration::from_secs(30)))
}

fn error_policy(
    _svc: Arc<InferenceService>,
    error: &Error,
    _ctx: Arc<Context>,
) -> Action {
    tracing::error!(%error, "reconciliation failed");
    Action::requeue(Duration::from_secs(60))
}

Setting Up the Controller

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    tracing_subscriber::init();

    let client = Client::try_default().await?;
    let services: Api<InferenceService> = Api::all(client.clone());
    let deployments: Api<Deployment> = Api::all(client.clone());

    let ctx = Arc::new(Context { client });

    Controller::new(services, Default::default())
        .owns(deployments, Default::default())
        .run(reconcile, error_policy, ctx)
        .for_each(|result| async move {
            match result {
                Ok((_obj, _action)) => {}
                Err(e) => tracing::error!(%e, "controller error"),
            }
        })
        .await;

    Ok(())
}

Memory Comparison: Go vs Rust Operator

Benchmarked with 5,000 custom resources under watch:

Metric	Go (controller-runtime)	Rust (kube-rs)
RSS at startup	45 MB	8 MB
RSS at 5K resources	320 MB	28 MB
P99 reconcile latency	12 ms	2 ms
GC pause (max)	15 ms	0 ms (no GC)
Binary size	35 MB	12 MB
Container image	50 MB (distroless)	15 MB (scratch)

Error Handling Patterns

Rust’s type system shines for operator reliability:

#[derive(Debug, thiserror::Error)]
enum OperatorError {
    #[error("kubernetes API error: {0}")]
    Kube(#[from] kube::Error),

    #[error("missing field: {0}")]
    MissingField(&'static str),

    #[error("invalid GPU type: {0} (expected: A100, H100, L40S)")]
    InvalidGpuType(String),

    #[error("reconciliation timeout after {0:?}")]
    Timeout(Duration),
}

// The compiler forces you to handle every error variant
// No more "panic: runtime error: index out of range" in production

When to Choose Rust Over Go for Operators

Choose Rust when:

Memory-constrained environments (edge, IoT, large fleets)
High reconciliation frequency (sub-second loops)
Operating on thousands of resources per controller
Security-critical operators (no buffer overflows possible)
You need predictable latency without GC pauses

Stick with Go when:

Team is already proficient in Go
Using Operator SDK / Kubebuilder scaffolding
Rapid prototyping of operator logic
Ecosystem integration matters more than performance

Production Deployment

# Multi-stage build for minimal image
FROM rust:1.82-slim AS builder
WORKDIR /app
COPY . .
RUN cargo build --release

FROM gcr.io/distroless/cc-debian12
COPY --from=builder /app/target/release/inference-operator /
ENTRYPOINT ["/inference-operator"]

The final image is under 20MB — compare to 50-100MB for typical Go operators.

Rust in 2026: Systems Programming — the broader Rust ecosystem
Kubernetes Operators Best Practices — patterns regardless of language
Multi-Tenant GPUs on Bare Metal — our KubeCon talk on GPU scheduling

The kube-rs ecosystem is mature enough for production. Start with kube.rs documentation and the examples repository.

Rust for Kubernetes Operators: Building Production Controllers with kube-rs

Why Rust for Operators?

The kube-rs Ecosystem

The Reconciliation Loop

Setting Up the Controller

Memory Comparison: Go vs Rust Operator

Error Handling Patterns

When to Choose Rust Over Go for Operators

Production Deployment

Related Articles

Managing AI Agents at Platform Scale: Cloudsmith's Take

Securing Agentic AI Traffic: Gravitee at PlatformCon 2026

Isovalent (Now Part of Cisco) on Simplifying Kubernetes Networking

Kief Morris on AI Agents and Being 'Human on the Loop'