Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
Context architecture for AI agents accuracy progression from 0% to 92%
AI

Context Architecture for AI Agents: From 0% to 92% Accuracy

Real experiment results: how adding context layers to an AI data agent improved accuracy from 0% to 92%. Column descriptions, business rules, and verified.

LB
Luca Berton
Β· 3 min read

Everyone says context matters. Here are the numbers.

There is a lot of talk about β€œcontext architecture” for AI agents. But how much does context actually matter? And which context moves the needle?

A team ran a rigorous experiment with a real AI analytics agent built for a healthcare company operating multiple clinics. No synthetic datasets. Real user questions. Same LLM, same data, zero prompt engineering tricks. They added context one layer at a time, measuring accuracy after every change.

The results are striking β€” and they validate something that experienced data engineers already know intuitively.

The experiment: 6 iterations

IterationWhat ChangedSQL GenerationAccuracy
1Raw tables only0%0%
2Modeled table (no context)38.5%0%
3Column descriptions added100%15%
4Business rules and instructions100%77%
5-6Metrics, verified queries, eval refinement100%92%

Let that sink in. The same LLM went from 0% accuracy to 92% accuracy β€” not by switching to a better model, not by clever prompt engineering, but by progressively adding the context that a good analyst accumulates over months on the job.

The boring stuff had the biggest impact

The single most impactful change was not some sophisticated RAG pipeline or multi-agent orchestration. It was column descriptions and a clean data model.

Going from raw tables to a properly modeled table with column descriptions took SQL generation from 0% to 100%. The agent could always generate SQL. What it could not do was generate SQL that gave trustworthy results until it understood what the columns actually meant.

This should not be surprising to anyone who has onboarded a new analyst. You do not hand them raw database access and say β€œfigure it out.” You give them:

  • A data dictionary explaining what each column means
  • Business rules about how metrics are calculated
  • Context about edge cases and data quirks
  • Verified queries they can use as reference

The AI agent needs exactly the same thing.

Context layers ranked by impact

Based on the experiment results and my experience deploying AI agents against enterprise data, here is how I rank the context layers:

Tier 1: Foundation (0% β†’ 15% accuracy)

Data modeling and column descriptions. This is the single most important investment. A well-modeled table with clear column names and descriptions gives the LLM enough to generate syntactically correct SQL that actually references the right data.

Without this, the agent is guessing. It might generate valid SQL, but against the wrong columns, with wrong join conditions, and wrong aggregation logic.

What to include:

# Example column description format
tables:
  - name: appointments
    description: "Patient appointments across all clinic locations"
    columns:
      - name: appointment_date
        description: "Date of the appointment (UTC). NULL for cancelled appointments that were never rescheduled."
        type: date
      - name: provider_id
        description: "Foreign key to providers table. Maps to the attending physician, not the referring physician."
        type: integer
      - name: status
        description: "Current appointment status. Values: scheduled, completed, cancelled, no_show. Note: 'completed' means the patient was seen, not that billing is finalized."
        type: varchar

The specificity matters. β€œappointment_date” is not enough. β€œDate of the appointment (UTC). NULL for cancelled appointments that were never rescheduled” β€” that is the context that prevents wrong answers.

Tier 2: Business logic (15% β†’ 77% accuracy)

Business rules and calculation instructions. This is where domain knowledge lives. Every organization has implicit rules that are not encoded in the schema:

  • β€œActive patients” means patients with at least one appointment in the last 12 months
  • Revenue calculations exclude write-offs and adjustments
  • β€œNew patient” is defined by the first appointment at any location, not per-clinic
  • Clinic performance metrics use a rolling 90-day window, not calendar quarter

Without these rules, the agent will generate technically correct SQL that answers the wrong question. It will count all patients instead of active ones. It will include write-offs in revenue. It will define β€œnew” differently than the business does.

business_rules:
  - rule: "Active patient definition"
    description: "A patient is considered active if they have at least one completed appointment in the trailing 12 months from the query date."
    sql_hint: "WHERE status = 'completed' AND appointment_date >= CURRENT_DATE - INTERVAL '12 months'"
    
  - rule: "Revenue calculation"
    description: "Revenue = sum of payment_amount where payment_status = 'posted'. Exclude adjustments (type = 'adjustment') and write-offs (type = 'writeoff')."
    sql_hint: "SUM(payment_amount) WHERE payment_status = 'posted' AND type NOT IN ('adjustment', 'writeoff')"

Tier 3: Verification (77% β†’ 92% accuracy)

Verified queries, metrics definitions, and evaluation refinement. This is the layer that turns a decent agent into a reliable one:

  • Golden queries β€” known-correct SQL for common questions, used as few-shot examples
  • Metric definitions β€” exact formulas for KPIs with test cases
  • Edge case documentation β€” what happens with NULL values, timezone boundaries, fiscal year vs calendar year
verified_queries:
  - question: "How many new patients did we see last month?"
    sql: |
      SELECT COUNT(DISTINCT patient_id)
      FROM appointments
      WHERE status = 'completed'
        AND appointment_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
        AND appointment_date < DATE_TRUNC('month', CURRENT_DATE)
        AND patient_id NOT IN (
          SELECT DISTINCT patient_id
          FROM appointments
          WHERE status = 'completed'
            AND appointment_date < DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
        )
    expected_result_range: "Typically 50-200 per clinic per month"
    notes: "New = first completed appointment ever, not first at a specific clinic"

What this means for agent builders

1. Invest in data foundations before agent sophistication

The biggest accuracy gains came from the β€œboring” work β€” clean data models and column descriptions. If you are building an AI agent that queries data, spend 80% of your effort on the data layer and 20% on the agent layer.

2. Context is not just retrieval

RAG gets all the attention, but this experiment shows that structured context (schemas, rules, verified queries) matters more than unstructured document retrieval for data agents. The context needs to be precise, not just relevant.

3. Model choice is secondary to context quality

The experiment used the same LLM throughout. The accuracy difference came entirely from context. Upgrading from GPT-4 to GPT-5 will not fix an agent that does not understand your data model. Fixing your column descriptions will.

4. Build context iteratively

Do not try to capture all business rules on day one. The experiment shows clear returns at each layer:

  1. Week 1: Model your tables and write column descriptions
  2. Week 2: Document the top 10 business rules that affect query results
  3. Week 3: Create verified queries for the 20 most common questions
  4. Week 4: Run evaluations and refine based on failure cases

5. This applies beyond SQL agents

The same principle applies to any agent that operates on domain-specific data:

  • Infrastructure agents need context about your environment topology, naming conventions, and runbook procedures
  • Code review agents need context about your team’s coding standards, architecture decisions, and tech debt areas
  • Customer support agents need context about product features, known issues, and escalation rules

The pattern is universal: the agent needs the context that an experienced human accumulates over time.

The context architecture stack

Based on these results and production deployments, here is the context architecture I recommend:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Verified Queries            β”‚  ← Few-shot examples
β”‚         (Golden SQLs)               β”‚     77% β†’ 92%
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚       Business Rules                β”‚  ← Domain logic
β”‚    (Metrics, Definitions)           β”‚     15% β†’ 77%
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚      Column Descriptions            β”‚  ← Schema context
β”‚     (Data Dictionary)               β”‚     0% β†’ 15%
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚        Data Model                   β”‚  ← Clean tables
β”‚   (Normalized, Well-Named)          β”‚     Foundation
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each layer multiplies the value of the layers below it. Skip the foundation and no amount of business rules will help. Skip the business rules and your verified queries will not generalize to new questions.

Key takeaway

Context was the difference between 0% and 92% accuracy. Not the model. Not the prompt. Not the agent framework.

The β€œboring” data engineering work β€” modeling tables, writing column descriptions, documenting business rules β€” had more impact on agent accuracy than any other factor.

If you are building AI agents that need to work with real data, stop optimizing your prompts and start investing in your context layers.

Free 30-min AI & Cloud consultation

Book Now