Why AI Agents Need Memory
Most AI agents today operate in a memoryless loop: prompt in, response out, context forgotten. Every session starts from zero. The agent cannot recall what it learned yesterday, what relationships it discovered, or what conclusions it reached.
GBrain changes this. Built by Y Combinator president Garry Tan, GBrain is a self-wiring knowledge graph that acts as a persistent memory layer for AI agents. Unlike vector databases that store flat embeddings, GBrain creates a structured graph where entities connect through typed relationships β and the graph grows automatically as the agent processes new information.
I covered the architecture and vision behind GBrain previously. This post is the hands-on implementation guide.
Architecture Overview
GBrain consists of three core components:
- Entity Extraction β LLM-powered parsing of unstructured text into structured entities and relationships
- Graph Storage β Neo4j as the persistent knowledge graph backend
- Semantic Search β Sentence transformers for embedding-based entity retrieval
The key insight: the graph wires itself. Feed it documents, conversations, or observations, and it automatically extracts entities, infers relationships, and connects new knowledge to existing nodes.
Prerequisites
# Python 3.10+
pip install neo4j sentence-transformers openai networkxYou will also need:
- A running Neo4j instance (Docker works fine)
- An OpenAI API key (for entity extraction)
docker run -d --name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:5-communityStep 1: Define the Knowledge Graph Schema
from dataclasses import dataclass
from typing import Optional
@dataclass
class Entity:
name: str
entity_type: str # person, concept, tool, project, etc.
description: Optional[str] = None
embedding: Optional[list[float]] = None
@dataclass
class Relationship:
source: str
target: str
relation_type: str # uses, created_by, depends_on, etc.
weight: float = 1.0
context: Optional[str] = NoneStep 2: Entity Extraction with LLMs
import openai
import json
EXTRACTION_PROMPT = """Extract entities and relationships from the following text.
Return JSON with:
- entities: list of {name, type, description}
- relationships: list of {source, target, relation, context}
Entity types: person, organization, technology, concept, project, event
Relationship types: uses, created, maintains, depends_on, competes_with, part_of
Text: {text}"""
def extract_entities(text: str) -> dict:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": EXTRACTION_PROMPT.format(text=text)}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)Step 3: Embedding Generation
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def embed_entity(entity: Entity) -> list[float]:
text = f"{entity.name}: {entity.description or entity.entity_type}"
return model.encode(text).tolist()Step 4: Neo4j Graph Storage
from neo4j import GraphDatabase
class GBrainStore:
def __init__(self, uri="bolt://localhost:7687", user="neo4j", password="password"):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def upsert_entity(self, entity: Entity):
with self.driver.session() as session:
session.run("""
MERGE (e:Entity {name: $name})
SET e.type = $type,
e.description = $description,
e.embedding = $embedding,
e.updated_at = datetime()
""", name=entity.name, type=entity.entity_type,
description=entity.description,
embedding=entity.embedding)
def upsert_relationship(self, rel: Relationship):
with self.driver.session() as session:
session.run("""
MATCH (s:Entity {name: $source})
MATCH (t:Entity {name: $target})
MERGE (s)-[r:RELATES_TO {type: $rel_type}]->(t)
SET r.weight = r.weight + $weight,
r.context = $context,
r.updated_at = datetime()
""", source=rel.source, target=rel.target,
rel_type=rel.relation_type, weight=rel.weight,
context=rel.context)
def query_neighbors(self, entity_name: str, depth: int = 2) -> list:
with self.driver.session() as session:
result = session.run("""
MATCH path = (e:Entity {name: $name})-[*1..$depth]-(neighbor)
RETURN neighbor.name AS name, neighbor.type AS type,
neighbor.description AS description,
length(path) AS distance
ORDER BY distance
LIMIT 20
""", name=entity_name, depth=depth)
return [dict(record) for record in result]Step 5: Semantic Search
import numpy as np
class GBrainSearch:
def __init__(self, store: GBrainStore):
self.store = store
def find_similar(self, query: str, top_k: int = 5) -> list[Entity]:
query_embedding = model.encode(query).tolist()
with self.store.driver.session() as session:
result = session.run("""
MATCH (e:Entity)
WHERE e.embedding IS NOT NULL
RETURN e.name AS name, e.type AS type,
e.description AS description, e.embedding AS embedding
""")
scored = []
for record in result:
similarity = np.dot(query_embedding, record["embedding"]) / (
np.linalg.norm(query_embedding) * np.linalg.norm(record["embedding"])
)
scored.append((similarity, record))
scored.sort(reverse=True, key=lambda x: x[0])
return scored[:top_k]Step 6: The Self-Wiring Pipeline
class GBrain:
def __init__(self):
self.store = GBrainStore()
self.search = GBrainSearch(self.store)
def ingest(self, text: str):
"""Feed text into the brain β it wires itself."""
# Extract structured knowledge
extracted = extract_entities(text)
# Create and store entities
for e in extracted.get("entities", []):
entity = Entity(
name=e["name"],
entity_type=e["type"],
description=e.get("description")
)
entity.embedding = embed_entity(entity)
self.store.upsert_entity(entity)
# Wire relationships
for r in extracted.get("relationships", []):
rel = Relationship(
source=r["source"],
target=r["target"],
relation_type=r["relation"],
context=r.get("context")
)
self.store.upsert_relationship(rel)
def recall(self, query: str, depth: int = 2) -> dict:
"""Query the brain β semantic search + graph traversal."""
# Find relevant entities
similar = self.search.find_similar(query, top_k=3)
# Expand via graph
context = []
for score, entity in similar:
neighbors = self.store.query_neighbors(entity["name"], depth)
context.append({
"entity": entity["name"],
"relevance": float(score),
"connections": neighbors
})
return {"query": query, "results": context}Usage Example
brain = GBrain()
# Feed it knowledge
brain.ingest("""
Kubernetes 1.32 introduced DRA (Dynamic Resource Allocation) for GPU scheduling.
The NVIDIA GPU Operator uses DRA to expose GPUs to pods. Red Hat OpenShift AI
builds on this for multi-tenant GPU sharing on bare metal clusters.
""")
brain.ingest("""
Luca Berton presented 'Multi-tenant GPUs on Bare Metal OpenShift AI' at
Red Hat Summit 2026 Discovery Theater. The talk covered MIG, time-slicing,
and DRA-based GPU partitioning strategies.
""")
# Query it
result = brain.recall("GPU sharing strategies for Kubernetes")
# Returns connected knowledge: DRA -> GPU Operator -> OpenShift AI -> MIG -> time-slicingWhy This Matters for Production AI
GBrain solves three critical problems:
- Context window limitations β Instead of stuffing everything into a prompt, query only relevant subgraphs
- Knowledge persistence β The agent remembers across sessions, building cumulative understanding
- Relationship reasoning β Graph traversal reveals connections that flat vector search misses
For enterprise AI platforms, this pattern enables:
- Audit trails β Every fact traced back to its source document
- Knowledge decay β Timestamp-based relevance scoring (older facts decay)
- Multi-agent sharing β Multiple agents can read/write the same knowledge graph
GBrain vs. Traditional RAG
| Aspect | Vector RAG | GBrain |
|---|---|---|
| Storage | Flat embeddings | Structured graph |
| Retrieval | Similarity search | Graph traversal + similarity |
| Relationships | Implicit | Explicit typed edges |
| Growth | Manual chunking | Self-wiring extraction |
| Reasoning | Single-hop | Multi-hop via graph paths |
Production Considerations
- Scale: Neo4j handles millions of nodes; for larger graphs consider Neptune or TigerGraph
- Extraction quality: GPT-4o works well; for cost optimization, fine-tune a smaller model on your domain
- Embedding refresh: Re-embed entities when descriptions change significantly
- Graph pruning: Implement TTL-based cleanup for stale relationships
Related Content
- GBrain: Garry Tanβs Self-Wiring Knowledge Graph for AI Agents
- gstack: Garry Tanβs Claude Code AI Factory
- OWASP Top 10 for LLM Applications
Building AI agent infrastructure with persistent memory? Letβs architect your knowledge layer.