How I Built a Smart Document Q&A System with RAG

Why I built this

I got tired of digging through Python and Django docs. You know how it is - you need one simple answer but end up reading five pages. So I decided to build my own Q&A system.

Step 1: Getting the docs

I downloaded the official Python and Django docs. Plain text files. No HTML junk. About 900 files total.

Step 2: Chunking

Split each file into small pieces. Each piece about 500 characters. Small overlap between pieces so I don't cut sentences in half.

Step 3: Vector database

Used ChromaDB. It's lightweight and works well with Python. Stored about 60,000 document pieces.

Step 4: Embeddings

Used a local model called all-MiniLM-L6-v2. Why local? Free and private. No API calls, no costs. The model turns text into numbers (vectors) so the system can find similar content.

Step 5: The Q&A flow

User types question. System turns question into vector. Searches ChromaDB for similar document pieces. Sends those pieces plus the question to DeepSeek API. Gets answer back. Shows answer to user.

Step 6: Streaming

Made answers appear letter by letter. Like ChatGPT. Better user experience. Used Django's StreamingHttpResponse for this.

Step 7: Multi-language

Added langdetect library. Detects user's language. Tells the AI to answer in that same language. Works for Chinese, English, French.

Step 8: Knowledge nodes

Each document file became a "knowledge node". When user searches, system shows related nodes on the side. Click to read full doc.

Problems I ran into

Memory issues: ChromaDB ate too much RAM. Fixed by processing files one by one and calling garbage collector after each.
Streaming + JSON: Couldn't send both in one response. Solution: send answer first, then special marker [NODES], then JSON. Frontend knows what to do.

What I learned

RAG isn't magic. It's just retrieval plus generation. The hard part is not the AI - it's getting the data ready. Chunk size matters. Overlap matters. Model choice matters.

Also learned that local models are good enough. You don't always need OpenAI.

The final result

60,000 document chunks
1,000+ knowledge nodes
Answers in Chinese, English, French
Response under 2 seconds
First word appears under 0.5 seconds

What's next

Maybe add a knowledge graph. Maybe switch to local LLM (Ollama). But for now, it works.

Try it

Go to luluma.xyz/knowledge/. Ask something. It works.

How I Built a Smart Document Q&A System with RAG

Freelance Developer – Available for Hire