Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
CodeMind Edge local semantic code search with Qdrant Edge
AI

CodeMind Edge: Search Your Codebase with AI β€” No Cloud

CodeMind Edge is a local, privacy-first semantic search engine for codebases built with Qdrant Edge. Ask questions about your code in plain English and.

LB
Luca Berton
Β· 2 min read

What if you could ask your codebase questions in plain English β€” and get back the exact function that answers them? That is what CodeMind Edge does, and it runs entirely on your machine. No cloud. No GPU. No Docker.

What Is CodeMind Edge?

CodeMind Edge is a local, privacy-first semantic search engine for codebases. It is built on Qdrant Edge, an in-process vector database that works like SQLite but for vectors. Your code never leaves your machine during indexing or retrieval.

The pipeline is straightforward:

  1. Parse a repository into function-level and class-level chunks
  2. Summarize each chunk using an LLM (cached, so it only runs once per function)
  3. Embed everything locally using fastembed (CPU, no GPU required)
  4. Store vectors on disk via Qdrant Edge β€” no server, no containers
  5. Query with natural language β€” retrieve semantically relevant code and get an LLM explanation

Why This Matters

Most code search tools fall into two categories: keyword search (grep, ripgrep) that misses semantic meaning, or cloud-based AI tools that send your code to external servers. CodeMind Edge sits in the sweet spot β€” semantic understanding without leaving your machine.

This is especially relevant for:

  • Regulated industries where code cannot leave the network
  • Large monorepos where keyword search returns too many false positives
  • Onboarding β€” new team members can ask β€œhow does authentication work?” instead of reading thousands of lines
  • Privacy-conscious teams who want AI assistance without cloud dependencies

Architecture

codemind/
β”œβ”€β”€ parser.py         # Walks repo, extracts function/class chunks
β”œβ”€β”€ Embedder_working.py  # Generates embeddings via fastembed (local CPU)
β”œβ”€β”€ summarizes_this.py   # Generates + caches LLM summaries per chunk
β”œβ”€β”€ store.py          # Qdrant Edge CRUD β€” upsert, search, count
β”œβ”€β”€ query_usage.py    # Query pipeline: embed β†’ search β†’ return
β”œβ”€β”€ indexer.py        # Orchestrates: parse β†’ summarize β†’ embed β†’ store
β”œβ”€β”€ cli.py            # Terminal interface (typer + rich)
└── server.py         # FastAPI web server

The design is clean. Each component has a single responsibility, and the caching layer (summary-cache.json) means you only pay the LLM cost once per function β€” subsequent queries hit the local vector store directly.

Qdrant Edge: SQLite for Vectors

The most interesting architectural choice is Qdrant Edge β€” a Rust-compiled in-process vector database. Unlike the full Qdrant server that runs as a separate service, Edge embeds directly into your Python process. No Docker containers, no network calls, no infrastructure to manage.

This is part of a broader trend toward edge-native AI tooling. Similar to how SQLite democratized relational databases by removing the server requirement, Qdrant Edge democratizes vector search.

Requirements

  • Python 3.12 β€” Qdrant Edge ships pre-compiled binaries only for 3.12
  • Azure OpenAI account β€” for summarization and query answering
  • No GPU required β€” fastembed runs on CPU

Getting Started

git clone https://github.com/Jagritii05/CodeMind-edge.git
cd CodeMind-edge
cp .env.example .env
# Fill in your Azure OpenAI keys
pip install -e .
codemind index ./your-repo
codemind query "how does the retry logic work?"

My Take

CodeMind Edge is a great example of the edge AI trend applied to developer tooling. The combination of local embeddings, on-disk vector storage, and cached summaries means the privacy and latency characteristics are excellent.

The Azure OpenAI dependency for summarization is the one cloud touchpoint β€” but since summaries are cached locally, it is a one-time cost per function. A future version using a local model like Ollama for summarization would make it fully offline.

If you work in a regulated environment or just prefer keeping your code local, this is worth trying. The Qdrant Edge documentation has more details on the underlying vector storage.

Free 30-min AI & Cloud consultation

Book Now