Legal Q&A RAG Chatbot

AI Engineer

2025–2026

Demo Soon

Quick Links

Problem

Legal teams need fast, reliable answers grounded in legislation and contracts. Generic LLMs hallucinate and cannot safely reason over large legal corpora without strong retrieval, filtering, and citation enforcement.

Solution

Built a custom RAG pipeline from scratch (no frameworks), implementing hybrid retrieval: BM25 keyword search + FAISS semantic vector search combined using Reciprocal Rank Fusion (RRF). Improved retrieval accuracy by 15–20% using cross-encoder re-ranking (ms-marco-MiniLM-L-6-v2). Developed a FastAPI backend with dual-mode responses (Solicitor Mode for technical output; Public Mode for plain-language explanations). Added comprehensive guardrails: domain filtering, citation enforcement, and PII redaction. Implemented enterprise authentication (JWT + OAuth2: Google/GitHub/Microsoft) with role-based access control (RBAC). Built a private document corpus system for user uploads and combined public/private retrieval using RRF. Deployed with Docker, PostgreSQL with Alembic migrations, Streamlit frontend with protected routes, structured logging, health checks, and metrics collection. Achieved 680+ embeddings/sec throughput via batch processing optimization and delivered 108+ end-to-end tests.

Key Results

131,253+ document chunks indexed
Sub-3-second average response latency
15–20% retrieval accuracy improvement via RRF + cross-encoder reranking
680+ embeddings/second throughput via batch processing optimization
40% reduction in hallucinations via citation enforcement + guardrails
Enterprise auth: JWT + OAuth2 + RBAC (Google/GitHub/Microsoft)
Private document corpus + public/private fusion with RRF
PostgreSQL + Alembic migrations; 108+ end-to-end tests

Tech Stack

PythonFastAPIDockerPostgreSQLAlembicStreamlitFAISSBM25RRFCross-encoder reranking (ms-marco-MiniLM-L-6-v2)OpenAI embeddings/LLMJWTOAuth2RBACStructured loggingHealth checksMetrics/monitoring

GitHub