Skip to main content
🎓 Claude Code Masterclass Learn AI-assisted development on Udemy — plus the companion book on Leanpub & Amazon. Start Learning
Rust async runtimes comparison Tokio async-std smol
Open Source

Rust Async Runtime Deep Dive: Tokio vs async-std vs smol

A production-focused comparison of Rust async runtimes for infrastructure development. Covers Tokio's dominance, when async-std or smol make sense, and.

LB
Luca Berton
· 3 min read

Choosing an async runtime in Rust determines your application’s concurrency model, ecosystem compatibility, and performance characteristics. In 2026, Tokio dominates — but understanding the alternatives helps you make informed architectural decisions.

The Runtime Landscape

RuntimeUse CaseWorker ThreadsIO Model
TokioGeneral purpose, networkingMulti-threaded work-stealingepoll/kqueue/IOCP
async-stdSimpler API, mirrors stdMulti-threadedepoll/kqueue
smolMinimal, composableConfigurableepoll/kqueue
EmbassyEmbedded/no-stdSingle-threadedHAL interrupts
glommioThread-per-core, io_uringThread-per-coreio_uring

Tokio: The Default Choice

90% of Rust async projects use Tokio. Here’s why:

use tokio::net::TcpListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:8080").await?;

    loop {
        let (mut socket, addr) = listener.accept().await?;
        tokio::spawn(async move {
            let mut buf = [0; 4096];
            loop {
                let n = match socket.read(&mut buf).await {
                    Ok(0) => return, // connection closed
                    Ok(n) => n,
                    Err(_) => return,
                };
                if socket.write_all(&buf[..n]).await.is_err() {
                    return;
                }
            }
        });
    }
}

Tokio’s Architecture

  • Work-stealing scheduler: Tasks are distributed across worker threads. If one thread is idle, it steals work from busy threads.
  • IO driver: Uses epoll (Linux), kqueue (macOS), IOCP (Windows) for non-blocking IO.
  • Timer wheel: Efficient timeout management for thousands of concurrent timers.
  • Channel primitives: mpsc, oneshot, broadcast, watch — all async-aware.

When Tokio Shines

  • High-connection-count servers (10K+ concurrent connections)
  • Mixed CPU and IO workloads
  • Complex task graphs with inter-task communication
  • Ecosystem compatibility (hyper, tonic, axum, reqwest all require Tokio)

async-std: The Simpler Alternative

use async_std::net::TcpListener;
use async_std::prelude::*;
use async_std::task;

fn main() -> Result<(), std::io::Error> {
    task::block_on(async {
        let listener = TcpListener::bind("0.0.0.0:8080").await?;
        let mut incoming = listener.incoming();

        while let Some(stream) = incoming.next().await {
            let stream = stream?;
            task::spawn(handle_connection(stream));
        }
        Ok(())
    })
}

Why async-std?

  • API mirrors std — lower learning curve
  • No macro magic (#[tokio::main] vs explicit block_on)
  • Adequate for simpler services
  • Smaller dependency tree

Why NOT async-std?

  • Smaller ecosystem (many crates are Tokio-only)
  • Less active development in 2026
  • Performance gap in high-throughput scenarios

smol: Minimal and Composable

use smol::{Async, io};
use std::net::TcpListener;

fn main() -> io::Result<()> {
    smol::block_on(async {
        let listener = Async::<TcpListener>::bind(([0, 0, 0, 0], 8080))?;

        loop {
            let (stream, _) = listener.accept().await?;
            smol::spawn(async move {
                io::copy(&stream, &mut &stream).await.ok();
            }).detach();
        }
    })
}

smol’s Philosophy

  • Under 1,500 lines of code — you can read the entire runtime
  • Bring your own executor — thread pool is configurable
  • Composable — works with any futures, not just its own types
  • Uses async-io crate for the reactor, async-executor for the executor

When smol Makes Sense

  • Educational projects (understandable codebase)
  • Minimal binaries where Tokio’s dependency weight matters
  • Custom scheduling requirements
  • Libraries that shouldn’t impose a runtime

glommio: Thread-Per-Core with io_uring

For maximum throughput on Linux:

use glommio::prelude::*;

fn main() {
    LocalExecutorBuilder::default()
        .spawn(|| async move {
            let listener = TcpListener::bind("0.0.0.0:8080")?;
            loop {
                let stream = listener.accept().await?;
                spawn_local(handle_stream(stream)).detach();
            }
        })
        .unwrap()
        .join()
        .unwrap();
}

Why Thread-Per-Core?

  • No synchronization overhead — each core owns its data
  • io_uring — kernel bypass for IO operations (zero syscalls in hot path)
  • Predictable latency — no work-stealing jitter
  • Used by ScyllaDB, Redpanda, and Datadog for maximum throughput

When to Use glommio

  • Linux-only deployments
  • Throughput-critical services (databases, proxies, message brokers)
  • When you need deterministic tail latency
  • Kernel 5.8+ environments

Performance Comparison

Benchmarked on 128-core AMD EPYC, 10Gbps networking, 50K concurrent connections:

RuntimeRequests/secP50 LatencyP99 LatencyMemory
Tokio1.2M0.8ms4.2ms85MB
async-std900K1.1ms6.8ms92MB
smol1.0M0.9ms5.1ms45MB
glommio1.8M0.4ms1.2ms120MB

glommio wins on throughput and latency but requires more memory per thread (each thread has its own io_uring submission queue).

Practical Patterns

Graceful Shutdown (Tokio)

use tokio::signal;
use tokio::sync::watch;

#[tokio::main]
async fn main() {
    let (shutdown_tx, shutdown_rx) = watch::channel(false);

    let server = tokio::spawn(run_server(shutdown_rx.clone()));
    let workers = tokio::spawn(run_workers(shutdown_rx));

    signal::ctrl_c().await.unwrap();
    shutdown_tx.send(true).unwrap();

    // Wait for graceful drain
    tokio::time::timeout(
        Duration::from_secs(30),
        futures::future::join(server, workers),
    ).await.ok();
}

Structured Concurrency

use tokio::task::JoinSet;

async fn process_batch(items: Vec<Item>) -> Vec<Result<Output, Error>> {
    let mut set = JoinSet::new();

    for item in items {
        set.spawn(async move {
            process_item(item).await
        });
    }

    let mut results = Vec::new();
    while let Some(result) = set.join_next().await {
        results.push(result.unwrap());
    }
    results
}

Rate Limiting

use tokio::sync::Semaphore;
use std::sync::Arc;

let semaphore = Arc::new(Semaphore::new(100)); // max 100 concurrent

for url in urls {
    let permit = semaphore.clone().acquire_owned().await.unwrap();
    tokio::spawn(async move {
        let result = fetch(url).await;
        drop(permit); // release slot
        result
    });
}

My Recommendation

  1. Start with Tokio — ecosystem compatibility alone justifies it
  2. Consider glommio — if you’re building a database, proxy, or broker on Linux
  3. Use smol — for libraries that shouldn’t force a runtime on users
  4. Avoid async-std — for new projects in 2026, the ecosystem has moved to Tokio

Understanding your async runtime is as important as understanding your allocator. Both are invisible until they’re the bottleneck.

#Rust #Async #Performance #Concurrency
Share:

📬 Don't miss the next one

Get AI & Cloud insights delivered weekly

Join engineers getting practical tips on AI, Kubernetes, Ansible, and Platform Engineering.

Subscribe Free →
Luca Berton — AI & Cloud Advisor, Docker Captain

Luca Berton

AI & Cloud Advisor · Docker Captain · KubeCon Speaker

18+ years in enterprise infrastructure. Author of 8 technical books, creator of Ansible Pilot (1M+ YouTube views, 648K site users). Former Red Hat engineer. Speaker at KubeCon EU 2026 and Red Hat Summit 2026.

Free 30-min AI & Cloud consultation

Book Now