Full-stack SaaS — Nuxt 3, PostgreSQL, Weaviate vector search, BullMQ queues, RAG-backed AI with hybrid Llama/OpenAI routing, WebSocket/WebRTC real-time, and FFmpeg media processing.
A full-stack spiritual platform engineered across five independently scalable domains: scripture content delivery, RAG-backed AI study, real-time community interaction, async media processing, and a hybrid AI orchestration layer.
HTTP request → Nitro server → PostgreSQL scripture query or Weaviate vector retrieval
AI queries routed through custom prompt orchestration → Llama 3.2 8B primary, OpenAI API fallback
Media uploads dispatched to BullMQ → FFmpeg transcoding worker → MinIO object storage
Live events broadcast via WebSocket sync; livestreams run over WebRTC peer connections
Session caching, rate limiting, and cross-instance pub/sub handled by Redis
Domain-driven architecture — each system boundary owns its data access, service logic, and scaling surface without coupling to the others.
Bible Verse is a full-stack SaaS platform built around a domain-driven architecture with five independently scalable system boundaries: scripture content delivery, AI-assisted study, real-time community interaction, async media processing, and infrastructure services.
The core engineering challenge is not storing scripture — it is building a system that delivers contextually accurate AI responses, handles concurrent real-time sessions, and processes media asynchronously without blocking the main request path. Each of these problems requires a different architectural pattern, and the platform composes them into a single deployment unit.
The Nitro server coordinates across domains without owning business logic: resolving scripture from PostgreSQL, triggering Weaviate retrievals for AI context, dispatching BullMQ jobs for media, and publishing Redis events for real-time broadcast.
Scripture content is embedded into Weaviate's vector database. At query time, semantically relevant passages are retrieved and injected into the prompt before inference — grounding AI responses in actual scripture text rather than relying on model memory.
Llama 3.2 8B handles primary inference for cost efficiency. A custom orchestration layer classifies query complexity and routes complex or low-confidence cases to the OpenAI API as a fallback — keeping routine queries cheap while preserving quality at the edge.
Three dedicated queues handle work outside the request path: audio transcoding, notification dispatch, and AI inference batching. No heavy operation blocks API response time — all long-running work is enqueued and processed by dedicated workers.
Community features use WebSockets for message sync and reaction broadcasting. Livestream sessions use WebRTC for peer-to-peer video. Redis pub/sub routes events across server instances so state stays consistent regardless of which node a client connects to.
Audio and video uploads are accepted immediately and enqueued as BullMQ jobs. A dedicated worker runs FFmpeg transformations, writes processed output to MinIO, and notifies the client via WebSocket on completion. Raw uploads never touch the public delivery path.
Bible and Quran content share a single Weaviate embedding collection. A vector similarity query with metadata filters can surface semantically related passages from either corpus in one call — no separate indices, no post-hoc merging.
Bible Verse follows a modular, domain-driven structure. Each bounded context owns its own data access, service layer, and scaling surface. The Nitro server acts as the coordination layer without being a bottleneck — domain services communicate through typed interfaces, not shared tables.
Loading diagram…
AI study request: client → Nitro API → Weaviate retrieval → prompt construction → Llama inference (or OpenAI fallback) → SSE stream → client.
Livestream session: WebRTC signaling through Nitro → peer connection established → Redis pub/sub synchronizes chat and reactions across all participants.
Five bounded domains compose the platform at runtime. Each domain owns its schema, service logic, and data access — no cross-domain direct database queries.
Bible + Quran — Content Delivery
Bible Logic — RAG-Backed Inference
Live + Community — WebSocket & WebRTC
Upload → Queue → Transcode → Deliver
Sessions · Rate Limits · Data Access
Chat messages, reactions, and community updates needed to stay consistent for users connected to different server instances simultaneously.
Redis pub/sub acts as the cross-instance event bus. WebSocket handlers publish events to a Redis channel; all server instances subscribe and broadcast to their locally connected clients. PostgreSQL is the authoritative state store — Redis is the delivery mechanism only.
General-purpose models hallucinate or over-generalize when answering specific scripture-referenced or theological questions.
A RAG pipeline retrieves relevant passages from Weaviate before inference. The prompt is assembled with retrieved context injected before the user query. The orchestration layer escalates low-confidence queries from Llama to the OpenAI API rather than letting the local model guess.
Audio and video uploads require CPU-intensive FFmpeg transcoding that cannot run synchronously in the API request path without causing timeouts.
Uploads are accepted immediately and enqueued as BullMQ jobs. A dedicated worker handles FFmpeg transforms, writes output to MinIO, updates the database record, and notifies the client via WebSocket. The API layer remains non-blocking throughout.
Users needed meaning-based passage discovery across Bible and Quran without knowing exact verse references. Keyword search alone was not sufficient.
Both corpora share a single Weaviate collection. Vector similarity queries with metadata filters scope results to one or both texts. A single retrieval call surfaces related passages from different scriptures without separate indices or post-hoc merging.
Routing all queries through the OpenAI API is cost-prohibitive at scale. Running all queries through the local model alone risks quality degradation on complex theological edge cases.
The orchestration layer classifies query complexity using lightweight heuristics before inference. Standard Q&A routes to Llama 3.2 8B locally. Queries exceeding a complexity threshold escalate to the OpenAI API. The routing is transparent to the user — only the backend path changes.
Five functionally distinct domains needed clear boundaries, but splitting into separate services would introduce network latency, deployment complexity, and distributed tracing overhead at current scale.
Domains are isolated as typed service modules within the monorepo. No domain accesses another's database tables directly. Each exposes a contract consumed by the Nitro API layer — the same isolation benefits as microservices, in a single deployable unit until scale justifies extraction.
BullMQ workers run outside the HTTP request cycle, making them hard to test deterministically — standard API testing tools don't reach the processor functions directly.
Each worker exports its processor function independently of the queue registration. Unit tests call the processor directly with fixture data. Integration tests spin up a dedicated Redis test instance and verify the full enqueue → process → database update → WebSocket notification cycle end-to-end.
WebRTC peer connections require a signaling channel to exchange SDP offers and ICE candidates. If the signaling server drops mid-negotiation, the connection never completes and the session is silently lost.
The Nitro WebSocket handler maintains a per-session signaling state machine. If a client reconnects during negotiation, the server replays the last unacknowledged signaling message. ICE candidate buffering prevents race conditions between offer/answer exchange and candidate delivery.
Vue 3 Composition API throughout — no Options API. SSR on public scripture and devotional pages for SEO; SPA mode for authenticated dashboard routes. Pinia stores scoped per domain. Nuxt auto-imports remove component boilerplate. Strict TypeScript shared between client and server layer.
H3 event handlers with Zod schema validation at every public endpoint. Service layer pattern — controllers are thin, logic lives in typed service modules. Middleware chain: auth → rate limit → logging → handler. Async work dispatched to BullMQ queues rather than awaited in the response path.
Five bounded contexts: Scripture, AI Reasoning, Community, Media, and Infrastructure. Each owns its schema, service logic, and repository layer. Cross-domain calls go through typed service interfaces — no direct DB table access across boundaries. Enables future service extraction without a rewrite.
PostgreSQL + Prisma for relational content and user data with typed migrations. Weaviate vectors indexed by book, chapter, and verse with metadata filters for scoped retrieval. Redis for session store, rate-limit counters, and pub/sub. MinIO for S3-compatible object storage with signed URL delivery.
JWT-based auth with short-lived access tokens and Redis-backed refresh token rotation. Redis tracks active sessions enabling server-side revocation without waiting on token expiry. Rate limiting applied per-IP and per-user at the Nitro middleware layer before any handler executes. Role-based access enforced at the service layer, not just route guards.
Strict TypeScript across frontend, Nitro server, and shared utilities. Zod schemas at API boundaries generate both runtime validation and inferred TypeScript types — one source of truth. Prisma client types flow from the schema; no hand-written model interfaces. ESLint and Prettier enforced via pre-commit hooks.
Every request generates a structured log entry with a correlation ID threaded through all downstream calls — PostgreSQL queries, Weaviate retrievals, BullMQ dispatches, and AI inference. Errors include stack traces, request context, and the authenticated user ID. Log level controlled per environment via runtime config.
Each external dependency (PostgreSQL, Redis, Weaviate, MinIO) has a dedicated health check endpoint polled by the load balancer. The AI inference path wraps Llama and OpenAI calls in a circuit breaker — if error rate exceeds threshold, new requests route to the fallback immediately rather than queuing behind timeouts.
Prometheus metrics track request latency, BullMQ queue depth, worker throughput, and AI inference time per model. Grafana dashboards surface queue backlog and Redis memory pressure. Alertmanager notifies on sustained queue stalls, high error rates, and inference latency spikes.
Docker Compose for local development with full service parity — same PostgreSQL, Redis, Weaviate, and MinIO versions as production. Environment config managed via Nuxt runtime config. CI pipeline runs lint, type check, and integration tests before any deploy step executes.
Scripture modelled at verse granularity — Books → Chapters → Verses with translation variants in a separate table to support multi-translation queries without denormalization. B-tree indexes on verse reference columns; GIN indexes on full-text search fields. User data isolated in separate schemas with FK constraints enforced at the DB layer.
A single ScriptureVerse collection stores embeddings for both Bible and Quran. Properties include text, translation, corpus (bible|quran), book, chapter, and verse. The corpus and translation fields act as metadata filters, enabling single-query cross-corpus retrieval without managing separate collections. Vectorizer: text2vec-transformers with a multilingual model.
Four namespaces: session:{userId} for refresh tokens (7-day TTL), ratelimit:{ip}:{route} for sliding window counters, presence:{roomId} for WebSocket connection state (30s TTL with heartbeat renewal), and events:{instanceId} pub/sub channels for cross-instance broadcast. No business data lives in Redis.
All schema changes go through Prisma migrate — no hand-written SQL in production. Migrations run in CI before the deploy step; the pipeline blocks if pending migrations are detected. A shadow database catches destructive migration issues in development before they reach staging. Seeder scripts populate translations and scripture content for local dev.