Sanity Bytes: How I Cut My AI Agent's Web Search Token Costs by 71% using TOON

AgenticOps & AIOps Principal Architect | Built Australia’s first production LLM-based autonomous network operations program | Enterprise AI Platform | Mining & Critical Infrastructure

If you are building AI agents that browse the web, you already know the pain: web research is a massive token hog.

I recently built a custom web-search skill for my GitHub Copilot CLI to keep it up to date with live information. The results were great, but the token consumption was through the roof. A single stripped HTML search page was costing me roughly 1,500 tokens. A standard research session (5 searches + a few pages read) was burning ~17,500 tokens before the agent even did any real reasoning.

It was expensive, slow, and polluted the model's attention with extraction noise.

Then I had an "Aha!" moment: Search results aren't unstructured text. They are uniform arrays of objects. To fix this, I borrowed a concept from API data serialization: TOON (Token-Oriented Object Notation).

The Shift: From HTML Soup to Structured Data

TOON is a compact, open-source format designed to compress JSON for LLM prompts. Its sweet spot is uniform arrays. Instead of repeating keys on every line like JSON does, TOON uses a CSV-style tabular layout.

Instead of dumping 6,000 characters of scraped HTML back to the model, I built a pure Python/curl pipeline to extract the data and format it into TOON.

Before (Raw HTML Snippets): ~1,500 tokens per search.

After (TOON Format): results[16] {title, url, snippet}:

Cost: ~437 tokens (71% reduction).

How the Pipeline Works (0 Dependencies)

I integrated a 5-step pipeline into my agent using just curl and Python standard libraries:

Pure Python TOON Encoder: Handles arrays, dicts, and primitives natively.
HTML to TOON Extractor: Parses Svelte-rendered HTML (via class selectors like search-snippet-title) to pull structured data straight into TOON.
Table Extraction: Converts <table/th/td> spec sheets directly into TOON tabular format, turning 800-token tables into clean 100-token blocks.
BM25-lite Paragraph Filter: For long articles, it scores paragraphs by query keyword overlap, returning only the 2-4 most relevant paragraphs instead of the whole page.
Boilerplate Stripper: Regex patterns to kill navigation text, cookie banners, and copyright footers.

The Real-World Results

These aren't estimates; they are measured from live requests via my home proxy:

1 Brave search result page: 1,500 tokens ➔ 437 tokens (71% saved)
Full uncapped search page: 16,000 tokens ➔ 437 tokens (97% saved)
Spec/Table page: 2,000 tokens ➔ 400 tokens (80% saved)

Total impact: A standard 5-search research session dropped from ~17,500 tokens to ~4,000. That’s 13,500 tokens of context window freed up for actual reasoning, per session.

The Takeaway

Stop treating web scraping output as text to be truncated. It’s a structured dataset. By treating the Web-to-Agent boundary the same way we treat the API-to-LLM boundary, we can build significantly faster, cheaper, and smarter agents.

#AI #MachineLearning #LLM #AIAgents #Python #WebScraping #OpenSource #GitHubCopilot #SoftwareEngineering

Sanity Bytes

Thursday, April 2, 2026

How I Cut My AI Agent's Web Search Token Costs by 71% using TOON

No comments:

Post a Comment

Blog Archive