How I Deployed a RAG Engine to Production with Docker, Nginx and DigitalOcean
I deployed a full RAG engine (FastAPI + PostgreSQL + pgvector + Redis) on a 4GB RAM VPS for $24/month. This article covers the real deployment architecture: Docker multi-stage builds, PostgreSQL tu...

Source: DEV Community
I deployed a full RAG engine (FastAPI + PostgreSQL + pgvector + Redis) on a 4GB RAM VPS for $24/month. This article covers the real deployment architecture: Docker multi-stage builds, PostgreSQL tuned for limited resources, Nginx as reverse proxy with SSE support, zero-downtime deploys with maintenance mode, automated backups and cron monitoring. The Context In the previous article I built a production RAG pipeline with hybrid search, cross-encoder reranking and semantic cache. Everything worked perfectly in local Docker. The problem: getting it to production on a budget VPS without it exploding. A RAG system isn't a typical CRUD app. It has: Embedding models that consume ~500MB of RAM per worker PostgreSQL with heavy extensions (pgvector + HNSW indexes) SSE streaming that needs long-lived connections Redis for rate limiting and cache All of that competing for 4GB of RAM Chosen Infrastructure Component Specification VPS DigitalOcean 4GB RAM / 2 vCPU / 80GB SSD OS Ubuntu 24.04 LTS Conta