DNS and Service Discovery

infrastructure-engineering

DNS and Service Discovery

Introduction

In a Kubernetes-based homelab, DNS and service discovery are what make the entire system feel “alive” instead of just a collection of isolated containers. In my own setup—running a K3s cluster across a small rack of machines and Raspberry Pi nodes, along with heavier workloads offloaded to a separate RTX 3090 development machine—service discovery is the backbone that allows everything to communicate without hardcoding IPs.

As the cluster grows to include services like my Bible app, Gitea CI runners, AI microservices, and backend APIs, DNS becomes essential for keeping communication stable even when pods restart, scale, or move between nodes. Instead of chasing IP addresses, services simply find each other by name.


How Kubernetes DNS Works

Kubernetes uses CoreDNS (in most modern clusters, including K3s) to resolve service names inside the cluster. Every service gets a stable DNS entry automatically.

For example:

  • my-service.default.svc.cluster.local
  • Shortened inside the cluster to just my-service

This means any pod can reach another service using a consistent name instead of an unstable IP address.

In my homelab rack, this is especially useful because nodes can reboot, pods reschedule, and workloads shift between ARM-based Pis and more powerful x86 machines. DNS abstracts all of that complexity away.


Service Discovery in Practice

Service discovery in Kubernetes happens through Services, Endpoints, and labels.

  • Services act as stable entry points.
  • Labels & selectors define which pods belong to which service.
  • Endpoints update automatically when pods change.

In my setup, this is what allows things like:

  • The Bible app frontend talking to backend APIs without knowing where they are running
  • AI services in Docker containers on my dev machine being reachable from cluster workloads when exposed properly
  • CI/CD runners in Gitea picking up jobs and routing them correctly across nodes

Everything is loosely coupled but still deeply connected.


DNS in a Hybrid Homelab (Rack + Dev Machine)

My setup isn’t a pure cloud-native cluster—it’s hybrid:

  • K3s running on a small homelab rack (multiple nodes)
  • A separate development machine with an RTX 3090 running AI containers
  • Gitea running inside the cluster acting as the glue for CI/CD
  • Services deployed via manifests pushed through repositories

This makes DNS even more important because not everything is in one place. Some services live inside the cluster, while others are exposed externally and consumed internally.

To make this work, I rely on a combination of:

  • Cluster DNS (CoreDNS)
  • Internal service names (*.svc.cluster.local)
  • External service mappings when needed (NodePorts / Ingress)
  • Consistent naming conventions across deployments

Debugging DNS Issues

When things break in a cluster, DNS is usually one of the first suspects. Common issues include:

  • Pod can’t resolve service name
  • Wrong namespace reference
  • CoreDNS crash or misconfiguration
  • Network policy blocking traffic

Typical debugging tools:

  • kubectl get svc
  • kubectl get endpoints
  • nslookup or dig inside a debug pod
  • Checking CoreDNS logs

In a multi-node rack setup like mine, it’s also important to verify node networking, especially when mixing ARM nodes and a more powerful x86 machine.


Why This Matters in My Architecture

As my system grows into a full ecosystem—Bible app, AI agents, game marketplace, CI/CD pipelines, and study tools—service discovery becomes the invisible layer that holds everything together.

Instead of manually wiring services, I can:

  • Deploy a new microservice
  • Let Kubernetes register it automatically
  • Immediately have it accessible across the cluster

This is what allows the system to scale from a few containers into a full distributed platform without becoming unmanageable.


Conclusion

DNS and service discovery are not just Kubernetes features—they are the foundation of how my entire homelab architecture communicates.

In my K3s rack setup, they allow a mix of Raspberry Pi nodes, a dedicated dev machine, and containerized services to behave like one unified system. Whether it’s AI workloads, CI/CD pipelines, or my Bible app backend, everything stays connected through simple service names instead of fragile infrastructure wiring.

As the system continues to grow, this layer will remain one of the most important pieces keeping the architecture stable, scalable, and easy to evolve.

Case Study

In Progress

Bible Verse — Case Study

Production SaaS Platform · Full-Stack · Founder & Sole Engineer

A domain-driven SaaS platform with five independently scalable system boundaries: scripture content delivery, RAG-backed AI study, real-time community interaction, async media processing, and infrastructure services — built and operated end-to-end.

Our Results

37K+
Verses Indexed
5
AI Models
5
Bounded Domains
3
Job Queues

How We Built It

  • RAG pipeline grounding AI responses in actual scripture rather than model memory
  • Hybrid Llama / OpenAI routing — local inference for cost, API fallback for quality at the edge
  • Non-blocking media processing — FFmpeg jobs enqueued via BullMQ, API never waits on transcoding
  • Cross-instance real-time consistency via Redis pub/sub behind WebSocket and WebRTC layers

Lessons Learned

  • Domain boundaries enforced at the service layer prevent coupling long before scale demands microservices.
  • RAG retrieval quality matters more than model size — better embeddings outperform a larger model on poor context.
  • Async queue design should be first-class, not bolted on; BullMQ worker isolation saved the request path repeatedly.

Stack

Nuxt 3TypeScriptNitroPostgreSQLPrismaRedisBullMQWeaviateMinIOFFmpegWebRTCWebSocketsLlama 3.2OpenAI APIKubernetes
View Full Case Study

Written by

Full-Stack Engineer & Systems Architect

5+ years building production systems · AI, Backend & Infrastructure · Founder of Bible Logic

Full-stack engineer with 5+ years of hands-on experience designing and shipping production systems — from Nuxt 3 frontends and Nitro APIs to self-hosted Kubernetes clusters, RAG pipelines, and real-time AI applications. Everything I write comes from systems I've designed, deployed, and operated in production.

5+ Years Experience AI Systems Specialist Kubernetes & Infrastructure
Nuxt 3TypeScriptPostgreSQLKubernetesRAG / LLMWebRTCAWS IVSRedis