AgenticOps & AIOps Principal Architect | Built Australia’s first production LLM-based autonomous network operations program | Enterprise AI Platform | Mining & Critical Infrastructure
If you are building AI agents that browse the web, you
already know the pain: web research is a massive token hog.
I recently built a custom web-search skill for my GitHub
Copilot CLI to keep it up to date with live information. The results were
great, but the token consumption was through the roof. A single stripped HTML
search page was costing me roughly 1,500 tokens. A standard research session (5
searches + a few pages read) was burning ~17,500 tokens before the agent even
did any real reasoning.
It was expensive, slow, and polluted the model's attention
with extraction noise.
Then I had an "Aha!" moment: Search results aren't
unstructured text. They are uniform arrays of objects. To fix this, I borrowed
a concept from API data serialization: TOON (Token-Oriented Object Notation).
The Shift: From HTML Soup to Structured Data
TOON is a compact, open-source format designed to compress
JSON for LLM prompts. Its sweet spot is uniform arrays. Instead of repeating
keys on every line like JSON does, TOON uses a CSV-style tabular layout.
Instead of dumping 6,000 characters of scraped HTML back to
the model, I built a pure Python/curl pipeline to extract the data and format
it into TOON.
Before (Raw HTML Snippets): ~1,500 tokens per search.
After (TOON Format): results[16] {title, url, snippet}:
Cost: ~437 tokens (71% reduction).
How the Pipeline Works (0 Dependencies)
I integrated a 5-step pipeline into my agent using just curl
and Python standard libraries:
- Pure Python TOON Encoder: Handles arrays, dicts, and primitives natively.
- HTML to TOON Extractor: Parses Svelte-rendered HTML (via class selectors like search-snippet-title) to pull structured data straight into TOON.
- Table Extraction: Converts <table/th/td> spec sheets directly into TOON tabular format, turning 800-token tables into clean 100-token blocks.
- BM25-lite Paragraph Filter: For long articles, it scores paragraphs by query keyword overlap, returning only the 2-4 most relevant paragraphs instead of the whole page.
- Boilerplate Stripper: Regex patterns to kill navigation text, cookie banners, and copyright footers.
The Real-World Results
These aren't estimates; they are measured from live requests
via my home proxy:
- 1 Brave search result page: 1,500 tokens ➔ 437 tokens (71% saved)
- Full uncapped search page: 16,000 tokens ➔ 437 tokens (97% saved)
- Spec/Table page: 2,000 tokens ➔ 400 tokens (80% saved)
Total impact: A standard 5-search research session dropped
from ~17,500 tokens to ~4,000. That’s 13,500 tokens of context window freed up for
actual reasoning, per session.
The Takeaway
Stop treating web scraping output as text to be truncated.
It’s a structured dataset. By treating the Web-to-Agent boundary the same way
we treat the API-to-LLM boundary, we can build significantly faster, cheaper,
and smarter agents.
#AI #MachineLearning #LLM #AIAgents #Python #WebScraping #OpenSource #GitHubCopilot #SoftwareEngineering
No comments:
Post a Comment