What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

Rust Async Runtime Deep Dive: Tokio vs async-std vs smol

Choosing an async runtime in Rust determines your application’s concurrency model, ecosystem compatibility, and performance characteristics. In 2026, Tokio dominates — but understanding the alternatives helps you make informed architectural decisions.

The Runtime Landscape

Runtime	Use Case	Worker Threads	IO Model
Tokio	General purpose, networking	Multi-threaded work-stealing	epoll/kqueue/IOCP
async-std	Simpler API, mirrors std	Multi-threaded	epoll/kqueue
smol	Minimal, composable	Configurable	epoll/kqueue
Embassy	Embedded/no-std	Single-threaded	HAL interrupts
glommio	Thread-per-core, io_uring	Thread-per-core	io_uring

Tokio: The Default Choice

90% of Rust async projects use Tokio. Here’s why:

use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:8080").await?;

    loop {
        let (mut socket, addr) = listener.accept().await?;
        tokio::spawn(async move {
            let mut buf = [0; 4096];
            loop {
                let n = match socket.read(&mut buf).await {
                    Ok(0) => return, // connection closed
                    Ok(n) => n,
                    Err(_) => return,
                };
                if socket.write_all(&buf[..n]).await.is_err() {
                    return;
                }
            }
        });
    }
}

Tokio’s Architecture

Work-stealing scheduler: Tasks are distributed across worker threads. If one thread is idle, it steals work from busy threads.
IO driver: Uses epoll (Linux), kqueue (macOS), IOCP (Windows) for non-blocking IO.
Timer wheel: Efficient timeout management for thousands of concurrent timers.
Channel primitives: mpsc, oneshot, broadcast, watch — all async-aware.

When Tokio Shines

High-connection-count servers (10K+ concurrent connections)
Mixed CPU and IO workloads
Complex task graphs with inter-task communication
Ecosystem compatibility (hyper, tonic, axum, reqwest all require Tokio)

async-std: The Simpler Alternative

use async_std::net::TcpListener;
use async_std::prelude::*;
use async_std::task;

fn main() -> Result<(), std::io::Error> {
    task::block_on(async {
        let listener = TcpListener::bind("0.0.0.0:8080").await?;
        let mut incoming = listener.incoming();

        while let Some(stream) = incoming.next().await {
            let stream = stream?;
            task::spawn(handle_connection(stream));
        }
        Ok(())
    })
}

Why async-std?

API mirrors std — lower learning curve
No macro magic (#[tokio::main] vs explicit block_on)
Adequate for simpler services
Smaller dependency tree

Why NOT async-std?

Smaller ecosystem (many crates are Tokio-only)
Less active development in 2026
Performance gap in high-throughput scenarios

smol: Minimal and Composable

use smol::{Async, io};
use std::net::TcpListener;

fn main() -> io::Result<()> {
    smol::block_on(async {
        let listener = Async::<TcpListener>::bind(([0, 0, 0, 0], 8080))?;

        loop {
            let (stream, _) = listener.accept().await?;
            smol::spawn(async move {
                io::copy(&stream, &mut &stream).await.ok();
            }).detach();
        }
    })
}

smol’s Philosophy

Under 1,500 lines of code — you can read the entire runtime
Bring your own executor — thread pool is configurable
Composable — works with any futures, not just its own types
Uses async-io crate for the reactor, async-executor for the executor

When smol Makes Sense

Educational projects (understandable codebase)
Minimal binaries where Tokio’s dependency weight matters
Custom scheduling requirements
Libraries that shouldn’t impose a runtime

glommio: Thread-Per-Core with io_uring

For maximum throughput on Linux:

use glommio::prelude::*;

fn main() {
    LocalExecutorBuilder::default()
        .spawn(|| async move {
            let listener = TcpListener::bind("0.0.0.0:8080")?;
            loop {
                let stream = listener.accept().await?;
                spawn_local(handle_stream(stream)).detach();
            }
        })
        .unwrap()
        .join()
        .unwrap();
}

Why Thread-Per-Core?

No synchronization overhead — each core owns its data
io_uring — kernel bypass for IO operations (zero syscalls in hot path)
Predictable latency — no work-stealing jitter
Used by ScyllaDB, Redpanda, and Datadog for maximum throughput

When to Use glommio

Linux-only deployments
Throughput-critical services (databases, proxies, message brokers)
When you need deterministic tail latency
Kernel 5.8+ environments

Performance Comparison

Benchmarked on 128-core AMD EPYC, 10Gbps networking, 50K concurrent connections:

Runtime	Requests/sec	P50 Latency	P99 Latency	Memory
Tokio	1.2M	0.8ms	4.2ms	85MB
async-std	900K	1.1ms	6.8ms	92MB
smol	1.0M	0.9ms	5.1ms	45MB
glommio	1.8M	0.4ms	1.2ms	120MB

glommio wins on throughput and latency but requires more memory per thread (each thread has its own io_uring submission queue).

Practical Patterns

Graceful Shutdown (Tokio)

use tokio::signal;
use tokio::sync::watch;

#[tokio::main]
async fn main() {
    let (shutdown_tx, shutdown_rx) = watch::channel(false);

    let server = tokio::spawn(run_server(shutdown_rx.clone()));
    let workers = tokio::spawn(run_workers(shutdown_rx));

    signal::ctrl_c().await.unwrap();
    shutdown_tx.send(true).unwrap();

    // Wait for graceful drain
    tokio::time::timeout(
        Duration::from_secs(30),
        futures::future::join(server, workers),
    ).await.ok();
}

Structured Concurrency

use tokio::task::JoinSet;

async fn process_batch(items: Vec<Item>) -> Vec<Result<Output, Error>> {
    let mut set = JoinSet::new();

    for item in items {
        set.spawn(async move {
            process_item(item).await
        });
    }

    let mut results = Vec::new();
    while let Some(result) = set.join_next().await {
        results.push(result.unwrap());
    }
    results
}

Rate Limiting

use tokio::sync::Semaphore;
use std::sync::Arc;

let semaphore = Arc::new(Semaphore::new(100)); // max 100 concurrent

for url in urls {
    let permit = semaphore.clone().acquire_owned().await.unwrap();
    tokio::spawn(async move {
        let result = fetch(url).await;
        drop(permit); // release slot
        result
    });
}

My Recommendation

Start with Tokio — ecosystem compatibility alone justifies it
Consider glommio — if you’re building a database, proxy, or broker on Linux
Use smol — for libraries that shouldn’t force a runtime on users
Avoid async-std — for new projects in 2026, the ecosystem has moved to Tokio

Rust in 2026 — the broader ecosystem
Rust vs Go — language choice for infrastructure
Pixi Package Manager — real-world Rust tooling

Understanding your async runtime is as important as understanding your allocator. Both are invisible until they’re the bottleneck.