What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

JSON vs TOON for AI Input: Token-Efficient Data for LLMs

Luca Berton • Tue Mar 03 2026 • 5 min read •

#json#toon#llm#tokens#ai

Why Token Count Matters for AI Input

Every token you send to an LLM costs money and consumes context window space. When you feed structured data — user records, product catalogs, API responses, configuration — the format you use directly impacts how many tokens you burn on syntax overhead versus actual content.

JSON is the default. It works. But for LLM input specifically, a significant chunk of your tokens go toward braces, brackets, repeated key names, and quotation marks that carry zero information value.

TOON (Token-Oriented Object Notation) was designed to solve exactly this problem: a compact, human-readable encoding of the JSON data model that minimizes tokens while remaining a lossless, drop-in replacement.

What Is TOON?

TOON combines two familiar ideas:

YAML-style indentation for nested objects — no braces, no commas
CSV-style tabular layout for uniform arrays — headers declared once in curly braces, data as comma-separated rows

The result is a format that represents the exact same data as JSON but uses dramatically fewer tokens. It is not a new data model — it is a translation layer. Your application still works with JSON programmatically; you encode to TOON only when sending data to an LLM.

The TOON spec is available as an npm package (toon-lang) and is at version 3.0.

TOON Syntax at a Glance

TOON has three core constructs:

Indented key-value pairs for objects (like YAML)
Square brackets with count for simple arrays: items[3]: a,b,c
Curly braces with headers for arrays of objects: items[3]{col1,col2,col3}: followed by CSV rows

Here is a quick example:

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true

The equivalent JSON:

{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {"id": 1, "name": "Blue Lake Trail", "distanceKm": 7.5, "elevationGain": 320, "companion": "ana", "wasSunny": true},
    {"id": 2, "name": "Ridge Overlook", "distanceKm": 9.2, "elevationGain": 540, "companion": "luis", "wasSunny": false},
    {"id": 3, "name": "Wildflower Loop", "distanceKm": 5.1, "elevationGain": 180, "companion": "sam", "wasSunny": true}
  ]
}

Count the tokens. The JSON version repeats "id", "name", "distanceKm", "elevationGain", "companion", "wasSunny" three times each — 18 repeated key tokens. TOON declares them once in the header.

JSON vs TOON: AI Workflow Examples

Example 1: GPU Instance Fleet for Cost Analysis

JSON:

[
  {"name": "gpu-worker-01", "type": "g5.xlarge", "gpu": "A10G", "vram_gb": 24, "hourly_cost": 1.006, "status": "running"},
  {"name": "gpu-worker-02", "type": "g5.2xlarge", "gpu": "A10G", "vram_gb": 24, "hourly_cost": 1.212, "status": "running"},
  {"name": "gpu-worker-03", "type": "p4d.24xlarge", "gpu": "A100", "vram_gb": 320, "hourly_cost": 32.77, "status": "stopped"},
  {"name": "inference-01", "type": "g5.xlarge", "gpu": "A10G", "vram_gb": 24, "hourly_cost": 1.006, "status": "running"}
]

TOON:

instances[4]{name,type,gpu,vram_gb,hourly_cost,status}:
  gpu-worker-01,g5.xlarge,A10G,24,1.006,running
  gpu-worker-02,g5.2xlarge,A10G,24,1.212,running
  gpu-worker-03,p4d.24xlarge,A100,320,32.77,stopped
  inference-01,g5.xlarge,A10G,24,1.006,running

Six field names repeated four times in JSON (24 occurrences) versus declared once in TOON. Roughly 50 percent fewer tokens.

Example 2: Training Experiment Results

Feeding 30 experiment runs to an LLM for analysis is a perfect TOON use case.

JSON (per row: ~45 tokens):

[
  {"run_id": "exp-001", "lr": 2e-5, "batch": 32, "epochs": 3, "loss": 0.342, "accuracy": 0.891, "f1": 0.876},
  {"run_id": "exp-002", "lr": 5e-5, "batch": 32, "epochs": 3, "loss": 0.298, "accuracy": 0.912, "f1": 0.901},
  {"run_id": "exp-003", "lr": 2e-5, "batch": 64, "epochs": 5, "loss": 0.267, "accuracy": 0.923, "f1": 0.918}
]

TOON (per row: ~15 tokens):

experiments[3]{run_id,lr,batch,epochs,loss,accuracy,f1}:
  exp-001,2e-5,32,3,0.342,0.891,0.876
  exp-002,5e-5,32,3,0.298,0.912,0.901
  exp-003,2e-5,64,5,0.267,0.923,0.918

At 30 experiments, JSON uses roughly 1,350 tokens. TOON uses roughly 500. That is 63 percent savings — enough to fit twice as many experiments in the same context window.

Example 3: Nested Model Configuration

JSON:

{
  "model": {
    "name": "llama-3.1-70b",
    "backend": "vllm",
    "quantization": "awq-4bit",
    "max_model_len": 8192
  },
  "inference": {
    "temperature": 0.7,
    "top_p": 0.95,
    "max_tokens": 2048
  },
  "lora": {
    "rank": 16,
    "alpha": 32,
    "dropout": 0.05
  }
}

TOON:

model:
  name: llama-3.1-70b
  backend: vllm
  quantization: awq-4bit
  max_model_len: 8192
inference:
  temperature: 0.7
  top_p: 0.95
  max_tokens: 2048
lora:
  rank: 16
  alpha: 32
  dropout: 0.05

For flat nested objects, savings are more modest (~20-25 percent) since there are no repeated keys. But the readability improvement is still significant — no closing braces, no commas, no quotation marks on keys.

When TOON Wins

TOON’s sweet spot is uniform arrays of objects — the kind of data you encounter constantly in AI workflows:

Training logs — runs with consistent columns (hyperparameters, metrics)
Evaluation results — model comparisons with same fields per row
Infrastructure inventory — instances, nodes, GPUs with consistent schemas
Benchmark results — model name, task, score, latency per row
User/product data — profiles or catalog entries for personalization

The pattern: multiple items, same structure, many fields per item. CSV compactness with explicit typing and structure.

When JSON Still Wins

TOON is not universally better. Use JSON when:

Data is deeply nested — three or more levels of nesting
Arrays are non-uniform — items have different fields, no tabular layout possible
The LLM must output structured data — most LLMs are trained to output JSON, not TOON
Programmatic consumption — your code reads JSON directly; TOON is a presentation layer
Schema validation needed — JSON Schema is mature; TOON tooling is early stage

Using TOON in Practice

TOON is a translation layer, not a storage format. The workflow:

# npm install toon-lang (Node.js reference implementation)
# Python: encode JSON to TOON before sending to LLM

import json

def json_array_to_toon(name: str, data: list[dict]) -> str:
    """Convert uniform array of objects to TOON tabular format."""
    if not data:
        return ""
    headers = list(data[0].keys())
    header_line = f"{name}[{len(data)}]{{{','.join(headers)}}}:"
    rows = []
    for item in data:
        rows.append("  " + ",".join(str(item.get(h, "")) for h in headers))
    return header_line + "\n" + "\n".join(rows)

# Your data stays as JSON internally
with open("experiments.json") as f:
    experiments = json.load(f)

# Convert to TOON only for LLM input
toon_input = json_array_to_toon("experiments", experiments)

prompt = f"""Analyze these experiment results and recommend the best configuration:

{toon_input}

Which run achieved the best balance of accuracy and training efficiency?"""

Your application keeps working with JSON. TOON is only used at the LLM boundary.

TOON for Infrastructure and Ansible Data

For those of us working with Ansible automation and cloud infrastructure, TOON is particularly useful for sending inventory data to an LLM for analysis or troubleshooting:

inventory[5]{hostname,role,os,cpu,ram_gb,datacenter,status}:
  web-prod-01,webserver,rhel-9.3,8,32,eu-west-1,healthy
  web-prod-02,webserver,rhel-9.3,8,32,eu-west-1,healthy
  db-prod-01,database,rhel-9.3,16,128,eu-west-1,warning
  k8s-worker-01,kubernetes,rhel-9.4,32,256,eu-west-1,healthy
  k8s-worker-02,kubernetes,rhel-9.4,32,256,eu-central,healthy

An LLM can immediately identify the warning status on db-prod-01 and suggest Kubernetes-based remediation or Terraform infrastructure changes — all while consuming half the tokens of the JSON equivalent.

LLM Compatibility

Can LLMs actually parse TOON reliably? Yes, because TOON deliberately uses patterns LLMs already understand:

CSV-style rows — LLMs have seen billions of CSV examples in training data
YAML-style indentation — deeply familiar from Python and configuration files
Explicit headers — the {field1,field2,field3} declaration removes all ambiguity

In practice, GPT-4, Claude, and Llama 3 all parse TOON input correctly without special prompting. The format is self-describing — models infer the structure immediately from the header declaration.

Token Savings Summary

Realistic token savings for common AI input patterns:

10-row uniform array, 6 fields: JSON ~350 tokens, TOON ~150 tokens (57% reduction)
50-row dataset, 8 fields: JSON ~2,400 tokens, TOON ~900 tokens (63% reduction)
Nested config, 2 levels: JSON ~120 tokens, TOON ~90 tokens (25% reduction)
100-row dataset, 10 fields: JSON ~6,000 tokens, TOON ~2,000 tokens (67% reduction)

The savings scale with row count. More rows, more savings — because headers are declared once regardless of data size.

The Bottom Line

Use JSON as your data format. Use TOON as your LLM input format. They are complementary:

JSON for storage, APIs, inter-service communication, LLM output parsing
TOON for feeding structured data to LLMs with minimal token overhead

TOON is not trying to replace JSON everywhere. It solves one specific problem — reducing token waste when sending structured data to language models — and it solves it well. For AI practitioners running thousands of LLM requests daily, switching tabular inputs from JSON to TOON can cut costs by 50-65 percent on those specific calls.

The format is stable at spec v3.0 and available via npm install toon-lang. It is still evolving, so now is a good time to experiment with it and shape where it goes.

📌 Need expert help with this topic?

🧠

AI Integration & GPU Platforms

Need help deploying AI/ML platforms? Get expert consulting on OpenShift AI, GPU orchestration, and MLOps.

☸️

Kubernetes & Containerization

Master Kubernetes and container orchestration with hands-on workshops and architecture consulting.

Book a free consultation →

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

LinkedIn Bluesky YouTube Contact →

← Back to Blog

Building Custom AI Skills with InstructLab Taxonomy

Create domain-specific AI capabilities using InstructLab's taxonomy system—from writing skill definitions to generating synthetic training data and validating fine-tuned models.

Mon Mar 02 2026

Accessing the OpenClaw Control UI Dashboard on Azure

How to access the OpenClaw Control UI dashboard from an Azure VM — via SSH tunnel (secure) or public IP. Covers device pairing, dashboard authentication, and the browser-based management interface.

Thu Feb 26 2026

Building a Persistent AI Agent Memory System with OpenClaw

End-to-end guide to building a complete persistent memory system for your OpenClaw AI agent. Combine memory flush, hybrid search, file-backed notes, SQLite indexing, and session hooks into a cohesive knowledge architecture.