What if you could ask your codebase questions in plain English β and get back the exact function that answers them? That is what CodeMind Edge does, and it runs entirely on your machine. No cloud. No GPU. No Docker.
What Is CodeMind Edge?
CodeMind Edge is a local, privacy-first semantic search engine for codebases. It is built on Qdrant Edge, an in-process vector database that works like SQLite but for vectors. Your code never leaves your machine during indexing or retrieval.
The pipeline is straightforward:
- Parse a repository into function-level and class-level chunks
- Summarize each chunk using an LLM (cached, so it only runs once per function)
- Embed everything locally using fastembed (CPU, no GPU required)
- Store vectors on disk via Qdrant Edge β no server, no containers
- Query with natural language β retrieve semantically relevant code and get an LLM explanation
Why This Matters
Most code search tools fall into two categories: keyword search (grep, ripgrep) that misses semantic meaning, or cloud-based AI tools that send your code to external servers. CodeMind Edge sits in the sweet spot β semantic understanding without leaving your machine.
This is especially relevant for:
- Regulated industries where code cannot leave the network
- Large monorepos where keyword search returns too many false positives
- Onboarding β new team members can ask βhow does authentication work?β instead of reading thousands of lines
- Privacy-conscious teams who want AI assistance without cloud dependencies
Architecture
codemind/
βββ parser.py # Walks repo, extracts function/class chunks
βββ Embedder_working.py # Generates embeddings via fastembed (local CPU)
βββ summarizes_this.py # Generates + caches LLM summaries per chunk
βββ store.py # Qdrant Edge CRUD β upsert, search, count
βββ query_usage.py # Query pipeline: embed β search β return
βββ indexer.py # Orchestrates: parse β summarize β embed β store
βββ cli.py # Terminal interface (typer + rich)
βββ server.py # FastAPI web serverThe design is clean. Each component has a single responsibility, and the caching layer (summary-cache.json) means you only pay the LLM cost once per function β subsequent queries hit the local vector store directly.
Qdrant Edge: SQLite for Vectors
The most interesting architectural choice is Qdrant Edge β a Rust-compiled in-process vector database. Unlike the full Qdrant server that runs as a separate service, Edge embeds directly into your Python process. No Docker containers, no network calls, no infrastructure to manage.
This is part of a broader trend toward edge-native AI tooling. Similar to how SQLite democratized relational databases by removing the server requirement, Qdrant Edge democratizes vector search.
Requirements
- Python 3.12 β Qdrant Edge ships pre-compiled binaries only for 3.12
- Azure OpenAI account β for summarization and query answering
- No GPU required β fastembed runs on CPU
Getting Started
git clone https://github.com/Jagritii05/CodeMind-edge.git
cd CodeMind-edge
cp .env.example .env
# Fill in your Azure OpenAI keys
pip install -e .
codemind index ./your-repo
codemind query "how does the retry logic work?"My Take
CodeMind Edge is a great example of the edge AI trend applied to developer tooling. The combination of local embeddings, on-disk vector storage, and cached summaries means the privacy and latency characteristics are excellent.
The Azure OpenAI dependency for summarization is the one cloud touchpoint β but since summaries are cached locally, it is a one-time cost per function. A future version using a local model like Ollama for summarization would make it fully offline.
If you work in a regulated environment or just prefer keeping your code local, this is worth trying. The Qdrant Edge documentation has more details on the underlying vector storage.