Systems Thinking

Donavan Jones Published January 14, 2026 infrastructure-engineering

Systems Thinking

Introduction

Building infrastructure is often treated as a collection of individual tools—Kubernetes, Docker, CI/CD pipelines, databases, and services—but real stability comes from understanding how those parts interact as a unified system. In my own homelab, especially with a Raspberry Pi-based K3s cluster, a separate development machine with an RTX 3090, and supporting services like Gitea and CI runners, I’ve learned that every component has ripple effects across the entire architecture. A change in one layer—compute, networking, storage, or deployment—inevitably affects the others.

This mindset shift is what systems thinking is about: designing not just for functionality, but for relationships, dependencies, and failure modes. Instead of asking “does this service work?”, I started asking “how does this service behave under load, failure, or change, and what else does it impact?”

Understanding Infrastructure as a System

My homelab evolved from a simple cluster into a multi-layered system:

A K3s Raspberry Pi cluster handling container orchestration and lightweight workloads
A separate development machine with an RTX 3090 running local models in Docker containers
A Gitea instance running inside the cluster, acting as the central code and CI/CD hub
CI runners deployed into Kubernetes, acting as the bridge between code commits and deployments
A growing set of microservices and applications, including parts of my Bible app ecosystem

At first, these were separate experiments. Over time, they became interconnected subsystems. The cluster doesn’t just “run apps”—it is the execution layer of a broader pipeline that starts at development on my local machine and ends in production-like deployments inside Kubernetes.

Feedback Loops and Dependencies

One of the biggest realizations in applying systems thinking is that feedback loops matter more than individual components.

For example:

A CI pipeline failure isn’t just a build issue—it affects deployment velocity and confidence in automation
Resource constraints on the Raspberry Pi cluster influence how I design services (lighter images, fewer dependencies, better caching)
Running AI workloads on a separate GPU machine forces me to design APIs between inference services and application services
Gitea becomes more than version control—it becomes the coordination layer for the entire system

Each part feeds information back into how I design the next part. The system teaches me how to build it.

Failure Domains and Isolation

Another key principle is isolation. In a poorly designed system, one failure cascades everywhere. In a well-designed system, failures are contained.

In my setup:

The GPU dev machine is isolated from the cluster so heavy inference workloads don’t disrupt orchestration
Kubernetes namespaces separate experimental workloads from core services
CI runners are treated as disposable infrastructure, not critical stateful components
Storage and stateful services are carefully separated from stateless application layers

This separation allows me to experiment aggressively without risking the entire system collapsing.

Scaling Through Composition

Instead of scaling vertically (bigger machines), I’ve leaned into scaling through composition—adding small, well-defined systems that plug into the existing architecture.

Examples include:

Adding new microservices to the cluster without modifying existing ones
Extending CI pipelines rather than rewriting them
Treating AI models as services rather than embedded logic
Building new applications (like parts of the Bible app) as independent modules that communicate through APIs

This approach keeps the system flexible. Each new addition strengthens the ecosystem rather than complicating it.

Observability as a First-Class Concern

In systems thinking, you can’t improve what you can’t see. Observability has become a core part of my infrastructure design.

Logs, metrics, and deployment feedback loops across my cluster help me understand:

Where bottlenecks occur in CI/CD pipelines
How workloads behave under resource pressure on the Pi cluster
When services degrade before they fully fail
How deployment changes affect system stability

Without observability, the system becomes guesswork. With it, the system becomes readable.

Conclusion

Systems thinking has completely changed how I approach infrastructure. My homelab is no longer a collection of tools—it is a living architecture where every decision has downstream effects. The Raspberry Pi cluster, the GPU development machine, Gitea, CI/CD pipelines, and my application stack all function as interconnected parts of a larger system rather than isolated projects.

The goal is no longer just to “deploy things,” but to design a system that can evolve, fail safely, and scale through composition. That shift—from tools to systems—is what makes the difference between a fragile setup and a resilient one.

Keep Reading

Case Study

In Progress

Bible Verse — Case Study

Production SaaS Platform · Full-Stack · Founder & Sole Engineer

A domain-driven SaaS platform with five independently scalable system boundaries: scripture content delivery, RAG-backed AI study, real-time community interaction, async media processing, and infrastructure services — built and operated end-to-end.

Our Results

37K+

Verses Indexed

AI Models

Bounded Domains

Job Queues

How We Built It

RAG pipeline grounding AI responses in actual scripture rather than model memory
Hybrid Llama / OpenAI routing — local inference for cost, API fallback for quality at the edge
Non-blocking media processing — FFmpeg jobs enqueued via BullMQ, API never waits on transcoding
Cross-instance real-time consistency via Redis pub/sub behind WebSocket and WebRTC layers

Lessons Learned

Domain boundaries enforced at the service layer prevent coupling long before scale demands microservices.
RAG retrieval quality matters more than model size — better embeddings outperform a larger model on poor context.
Async queue design should be first-class, not bolted on; BullMQ worker isolation saved the request path repeatedly.

Stack

Nuxt 3TypeScriptNitroPostgreSQLPrismaRedisBullMQWeaviateMinIOFFmpegWebRTCWebSocketsLlama 3.2OpenAI APIKubernetes

View Full Case Study

Written by

Donavan Jones Full-Stack Engineer & Systems Architect

5+ years building production systems · AI, Backend & Infrastructure · Founder of Bible Logic

Full-stack engineer with 5+ years of hands-on experience designing and shipping production systems — from Nuxt 3 frontends and Nitro APIs to self-hosted Kubernetes clusters, RAG pipelines, and real-time AI applications. Everything I write comes from systems I've designed, deployed, and operated in production.

5+ Years Experience AI Systems Specialist Kubernetes & Infrastructure

Nuxt 3TypeScriptPostgreSQLKubernetesRAG / LLMWebRTCAWS IVSRedis

Full Author Bio GitHub LinkedIn Resume Systems

Menu