Why Token Count Matters for AI Input
Every token you send to an LLM costs money and consumes context window space. When you feed structured data β user records, product catalogs, API responses, configuration β the format you use directly impacts how many tokens you burn on syntax overhead versus actual content.
JSON is the default. It works. But for LLM input specifically, a significant chunk of your tokens go toward braces, brackets, repeated key names, and quotation marks that carry zero information value.
TOON (Token-Oriented Object Notation) was designed to solve exactly this problem: a compact, human-readable encoding of the JSON data model that minimizes tokens while remaining a lossless, drop-in replacement.
What Is TOON?
TOON combines two familiar ideas:
- YAML-style indentation for nested objects β no braces, no commas
- CSV-style tabular layout for uniform arrays β headers declared once in curly braces, data as comma-separated rows
The result is a format that represents the exact same data as JSON but uses dramatically fewer tokens. It is not a new data model β it is a translation layer. Your application still works with JSON programmatically; you encode to TOON only when sending data to an LLM.
The TOON spec is available as an npm package (toon-lang) and is at version 3.0.
TOON Syntax at a Glance
TOON has three core constructs:
- Indented key-value pairs for objects (like YAML)
- Square brackets with count for simple arrays:
items[3]: a,b,c - Curly braces with headers for arrays of objects:
items[3]{col1,col2,col3}:followed by CSV rows
Here is a quick example:
context:
task: Our favorite hikes together
location: Boulder
season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
1,Blue Lake Trail,7.5,320,ana,true
2,Ridge Overlook,9.2,540,luis,false
3,Wildflower Loop,5.1,180,sam,trueThe equivalent JSON:
{
"context": {
"task": "Our favorite hikes together",
"location": "Boulder",
"season": "spring_2025"
},
"friends": ["ana", "luis", "sam"],
"hikes": [
{"id": 1, "name": "Blue Lake Trail", "distanceKm": 7.5, "elevationGain": 320, "companion": "ana", "wasSunny": true},
{"id": 2, "name": "Ridge Overlook", "distanceKm": 9.2, "elevationGain": 540, "companion": "luis", "wasSunny": false},
{"id": 3, "name": "Wildflower Loop", "distanceKm": 5.1, "elevationGain": 180, "companion": "sam", "wasSunny": true}
]
}Count the tokens. The JSON version repeats "id", "name", "distanceKm", "elevationGain", "companion", "wasSunny" three times each β 18 repeated key tokens. TOON declares them once in the header.
JSON vs TOON: AI Workflow Examples
Example 1: GPU Instance Fleet for Cost Analysis
JSON:
[
{"name": "gpu-worker-01", "type": "g5.xlarge", "gpu": "A10G", "vram_gb": 24, "hourly_cost": 1.006, "status": "running"},
{"name": "gpu-worker-02", "type": "g5.2xlarge", "gpu": "A10G", "vram_gb": 24, "hourly_cost": 1.212, "status": "running"},
{"name": "gpu-worker-03", "type": "p4d.24xlarge", "gpu": "A100", "vram_gb": 320, "hourly_cost": 32.77, "status": "stopped"},
{"name": "inference-01", "type": "g5.xlarge", "gpu": "A10G", "vram_gb": 24, "hourly_cost": 1.006, "status": "running"}
]TOON:
instances[4]{name,type,gpu,vram_gb,hourly_cost,status}:
gpu-worker-01,g5.xlarge,A10G,24,1.006,running
gpu-worker-02,g5.2xlarge,A10G,24,1.212,running
gpu-worker-03,p4d.24xlarge,A100,320,32.77,stopped
inference-01,g5.xlarge,A10G,24,1.006,runningSix field names repeated four times in JSON (24 occurrences) versus declared once in TOON. Roughly 50 percent fewer tokens.
Example 2: Training Experiment Results
Feeding 30 experiment runs to an LLM for analysis is a perfect TOON use case.
JSON (per row: ~45 tokens):
[
{"run_id": "exp-001", "lr": 2e-5, "batch": 32, "epochs": 3, "loss": 0.342, "accuracy": 0.891, "f1": 0.876},
{"run_id": "exp-002", "lr": 5e-5, "batch": 32, "epochs": 3, "loss": 0.298, "accuracy": 0.912, "f1": 0.901},
{"run_id": "exp-003", "lr": 2e-5, "batch": 64, "epochs": 5, "loss": 0.267, "accuracy": 0.923, "f1": 0.918}
]TOON (per row: ~15 tokens):
experiments[3]{run_id,lr,batch,epochs,loss,accuracy,f1}:
exp-001,2e-5,32,3,0.342,0.891,0.876
exp-002,5e-5,32,3,0.298,0.912,0.901
exp-003,2e-5,64,5,0.267,0.923,0.918At 30 experiments, JSON uses roughly 1,350 tokens. TOON uses roughly 500. That is 63 percent savings β enough to fit twice as many experiments in the same context window.
Example 3: Nested Model Configuration
JSON:
{
"model": {
"name": "llama-3.1-70b",
"backend": "vllm",
"quantization": "awq-4bit",
"max_model_len": 8192
},
"inference": {
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 2048
},
"lora": {
"rank": 16,
"alpha": 32,
"dropout": 0.05
}
}TOON:
model:
name: llama-3.1-70b
backend: vllm
quantization: awq-4bit
max_model_len: 8192
inference:
temperature: 0.7
top_p: 0.95
max_tokens: 2048
lora:
rank: 16
alpha: 32
dropout: 0.05For flat nested objects, savings are more modest (~20-25 percent) since there are no repeated keys. But the readability improvement is still significant β no closing braces, no commas, no quotation marks on keys.
When TOON Wins
TOONβs sweet spot is uniform arrays of objects β the kind of data you encounter constantly in AI workflows:
- Training logs β runs with consistent columns (hyperparameters, metrics)
- Evaluation results β model comparisons with same fields per row
- Infrastructure inventory β instances, nodes, GPUs with consistent schemas
- Benchmark results β model name, task, score, latency per row
- User/product data β profiles or catalog entries for personalization
The pattern: multiple items, same structure, many fields per item. CSV compactness with explicit typing and structure.
When JSON Still Wins
TOON is not universally better. Use JSON when:
- Data is deeply nested β three or more levels of nesting
- Arrays are non-uniform β items have different fields, no tabular layout possible
- The LLM must output structured data β most LLMs are trained to output JSON, not TOON
- Programmatic consumption β your code reads JSON directly; TOON is a presentation layer
- Schema validation needed β JSON Schema is mature; TOON tooling is early stage
Using TOON in Practice
TOON is a translation layer, not a storage format. The workflow:
# npm install toon-lang (Node.js reference implementation)
# Python: encode JSON to TOON before sending to LLM
import json
def json_array_to_toon(name: str, data: list[dict]) -> str:
"""Convert uniform array of objects to TOON tabular format."""
if not data:
return ""
headers = list(data[0].keys())
header_line = f"{name}[{len(data)}]{{{','.join(headers)}}}:"
rows = []
for item in data:
rows.append(" " + ",".join(str(item.get(h, "")) for h in headers))
return header_line + "\n" + "\n".join(rows)
# Your data stays as JSON internally
with open("experiments.json") as f:
experiments = json.load(f)
# Convert to TOON only for LLM input
toon_input = json_array_to_toon("experiments", experiments)
prompt = f"""Analyze these experiment results and recommend the best configuration:
{toon_input}
Which run achieved the best balance of accuracy and training efficiency?"""Your application keeps working with JSON. TOON is only used at the LLM boundary.
TOON for Infrastructure and Ansible Data
For those of us working with Ansible automation and cloud infrastructure, TOON is particularly useful for sending inventory data to an LLM for analysis or troubleshooting:
inventory[5]{hostname,role,os,cpu,ram_gb,datacenter,status}:
web-prod-01,webserver,rhel-9.3,8,32,eu-west-1,healthy
web-prod-02,webserver,rhel-9.3,8,32,eu-west-1,healthy
db-prod-01,database,rhel-9.3,16,128,eu-west-1,warning
k8s-worker-01,kubernetes,rhel-9.4,32,256,eu-west-1,healthy
k8s-worker-02,kubernetes,rhel-9.4,32,256,eu-central,healthyAn LLM can immediately identify the warning status on db-prod-01 and suggest Kubernetes-based remediation or Terraform infrastructure changes β all while consuming half the tokens of the JSON equivalent.
LLM Compatibility
Can LLMs actually parse TOON reliably? Yes, because TOON deliberately uses patterns LLMs already understand:
- CSV-style rows β LLMs have seen billions of CSV examples in training data
- YAML-style indentation β deeply familiar from Python and configuration files
- Explicit headers β the
{field1,field2,field3}declaration removes all ambiguity
In practice, GPT-4, Claude, and Llama 3 all parse TOON input correctly without special prompting. The format is self-describing β models infer the structure immediately from the header declaration.
Token Savings Summary
Realistic token savings for common AI input patterns:
- 10-row uniform array, 6 fields: JSON ~350 tokens, TOON ~150 tokens (57% reduction)
- 50-row dataset, 8 fields: JSON ~2,400 tokens, TOON ~900 tokens (63% reduction)
- Nested config, 2 levels: JSON ~120 tokens, TOON ~90 tokens (25% reduction)
- 100-row dataset, 10 fields: JSON ~6,000 tokens, TOON ~2,000 tokens (67% reduction)
The savings scale with row count. More rows, more savings β because headers are declared once regardless of data size.
The Bottom Line
Use JSON as your data format. Use TOON as your LLM input format. They are complementary:
- JSON for storage, APIs, inter-service communication, LLM output parsing
- TOON for feeding structured data to LLMs with minimal token overhead
TOON is not trying to replace JSON everywhere. It solves one specific problem β reducing token waste when sending structured data to language models β and it solves it well. For AI practitioners running thousands of LLM requests daily, switching tabular inputs from JSON to TOON can cut costs by 50-65 percent on those specific calls.
The format is stable at spec v3.0 and available via npm install toon-lang. It is still evolving, so now is a good time to experiment with it and shape where it goes.
