RAGLite: Lightweight RAG with DuckDB or PostgreSQL and Late Chunking
Building a RAG pipeline usually means stitching together a bunch of heavy dependencies — vector databases, embedding services, orchestration frameworks. It’s powerful, but often overkill if you just want to query a few PDFs or a codebase. RAGLite takes the opposite approach: keep it small, keep it local, and only add complexity when you actually need it.
It uses DuckDB or PostgreSQL as the vector store, relies on your existing SQL skills, and introduces late chunking as a smarter way to handle retrieval. No Docker containers for a small project. No cloud API key just to test an idea.
What It Does
RAGLite is a minimal Python library for retrieval-augmented generation. You give it documents (PDFs, markdown, code, whatever), it chunks them, embeds them, and stores them in DuckDB or PostgreSQL. Then you ask questions, and it retrieves the relevant chunks, hands them to an LLM, and gives you an answer.
That part is standard. What makes RAGLite interesting is how it handles chunking and storage.
- Late chunking: Instead of chunking documents first and then embedding, it embeds whole documents/sections and only chunks at retrieval time. This means you don’t lose context from arbitrary chunk boundaries. When you query, it splits the relevant part dynamically, which often yields better answers. The repo has a clear diagram showing this in action.
- BYO database: DuckDB for single-user, local use (no server setup). PostgreSQL for multi-user or production, with pgvector for similarity search.
- SQL-based vectors: You write SQL to query embeddings. If you know PostgreSQL or DuckDB, you already know how to use this.
Why It’s Cool
Three things jumped out when I looked at the code:
- Late chunking actually works. Most RAG tools chunk into fixed sizes, which destroys semantic relationships between sentences or paragraphs. Late chunking keeps the full context until the last moment, so retrieval is more accurate. The paper they reference is solid, and the implementation feels lightweight.
- No infrastructure hell. You can run this on a laptop with DuckDB, no server process. When you need to scale, you swap DuckDB for PostgreSQL with a config change. That’s it.
- Transparent embeddings. The embeddings are just BLOBs in SQL rows. You can inspect them, filter them, join them with other tables. No black-box vector database magic.
Use cases: internal docum