Go has been the default language for Kubernetes operators since the ecosystem began. But Rust operators are gaining traction in 2026 β not for ideology, but for measurable production benefits: 10x less memory, no GC pauses during reconciliation, and compile-time guarantees that prevent entire categories of runtime panics.
Why Rust for Operators?
A typical Go operator managing 10,000 custom resources consumes 200-500MB of RAM. The equivalent Rust operator: 20-50MB. When youβre running operators on every cluster in a fleet, that difference is the cost of an extra node.
Beyond memory:
- No garbage collection pauses during reconciliation β predictable latency
- Compile-time validation of API structures β no runtime JSON unmarshalling surprises
- Single binary deployment β no runtime dependencies, minimal container image
- Thread safety guarantees β the borrow checker prevents data races in concurrent reconcilers
The kube-rs Ecosystem
The Rust Kubernetes ecosystem centers on kube-rs:
use kube::{Api, Client, CustomResource};
use kube::runtime::controller::{Action, Controller};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::sync::Arc;
// Define your Custom Resource
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(group = "ai.example.com", version = "v1", kind = "InferenceService")]
#[kube(namespaced, status = "InferenceServiceStatus")]
pub struct InferenceServiceSpec {
pub model: String,
pub replicas: i32,
pub gpu_type: String,
pub max_batch_size: Option<u32>,
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
pub struct InferenceServiceStatus {
pub ready_replicas: i32,
pub endpoint: Option<String>,
pub phase: String,
}The Reconciliation Loop
use kube::runtime::controller::Action;
use std::time::Duration;
async fn reconcile(
svc: Arc<InferenceService>,
ctx: Arc<Context>,
) -> Result<Action, Error> {
let client = &ctx.client;
let ns = svc.namespace().unwrap_or_default();
let name = svc.name_any();
// Ensure deployment exists
let deployments: Api<Deployment> = Api::namespaced(client.clone(), &ns);
match deployments.get_opt(&name).await? {
Some(existing) => {
// Update if spec changed
if needs_update(&existing, &svc.spec) {
let patch = build_deployment_patch(&svc);
deployments.patch(&name, &PatchParams::apply("inference-operator"), &patch).await?;
}
}
None => {
// Create deployment
let deployment = build_deployment(&svc);
deployments.create(&PostParams::default(), &deployment).await?;
}
}
// Update status
update_status(client, &svc).await?;
// Requeue after 30 seconds
Ok(Action::requeue(Duration::from_secs(30)))
}
fn error_policy(
_svc: Arc<InferenceService>,
error: &Error,
_ctx: Arc<Context>,
) -> Action {
tracing::error!(%error, "reconciliation failed");
Action::requeue(Duration::from_secs(60))
}Setting Up the Controller
#[tokio::main]
async fn main() -> anyhow::Result<()> {
tracing_subscriber::init();
let client = Client::try_default().await?;
let services: Api<InferenceService> = Api::all(client.clone());
let deployments: Api<Deployment> = Api::all(client.clone());
let ctx = Arc::new(Context { client });
Controller::new(services, Default::default())
.owns(deployments, Default::default())
.run(reconcile, error_policy, ctx)
.for_each(|result| async move {
match result {
Ok((_obj, _action)) => {}
Err(e) => tracing::error!(%e, "controller error"),
}
})
.await;
Ok(())
}Memory Comparison: Go vs Rust Operator
Benchmarked with 5,000 custom resources under watch:
| Metric | Go (controller-runtime) | Rust (kube-rs) |
|---|---|---|
| RSS at startup | 45 MB | 8 MB |
| RSS at 5K resources | 320 MB | 28 MB |
| P99 reconcile latency | 12 ms | 2 ms |
| GC pause (max) | 15 ms | 0 ms (no GC) |
| Binary size | 35 MB | 12 MB |
| Container image | 50 MB (distroless) | 15 MB (scratch) |
Error Handling Patterns
Rustβs type system shines for operator reliability:
#[derive(Debug, thiserror::Error)]
enum OperatorError {
#[error("kubernetes API error: {0}")]
Kube(#[from] kube::Error),
#[error("missing field: {0}")]
MissingField(&'static str),
#[error("invalid GPU type: {0} (expected: A100, H100, L40S)")]
InvalidGpuType(String),
#[error("reconciliation timeout after {0:?}")]
Timeout(Duration),
}
// The compiler forces you to handle every error variant
// No more "panic: runtime error: index out of range" in productionWhen to Choose Rust Over Go for Operators
Choose Rust when:
- Memory-constrained environments (edge, IoT, large fleets)
- High reconciliation frequency (sub-second loops)
- Operating on thousands of resources per controller
- Security-critical operators (no buffer overflows possible)
- You need predictable latency without GC pauses
Stick with Go when:
- Team is already proficient in Go
- Using Operator SDK / Kubebuilder scaffolding
- Rapid prototyping of operator logic
- Ecosystem integration matters more than performance
Production Deployment
# Multi-stage build for minimal image
FROM rust:1.82-slim AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM gcr.io/distroless/cc-debian12
COPY --from=builder /app/target/release/inference-operator /
ENTRYPOINT ["/inference-operator"]The final image is under 20MB β compare to 50-100MB for typical Go operators.
Related Articles
- Rust in 2026: Systems Programming β the broader Rust ecosystem
- Kubernetes Operators Best Practices β patterns regardless of language
- Multi-Tenant GPUs on Bare Metal β our KubeCon talk on GPU scheduling
The kube-rs ecosystem is mature enough for production. Start with kube.rs documentation and the examples repository.