Infrastructure Systems Engineering

Private Cloud Infrastructure

Self-hosted Kubernetes cluster powering storage, compute, CI/CD, observability, and distributed services across ARM64 homelab nodes.

Infrastructure Flow
User
Ingress
Kubernetes
Services
Storage
Observability

Requests flow through ingress controllers into Kubernetes workloads, backed by distributed storage, databases, and full observability stacks.

Cluster
k3s ARM64 nodes (Pi 5 cluster)
Compute
ARM64 workloads + GPU offload
Storage
MinIO + persistent volumes
CI/CD
Gitea + runners + automation
Data
PostgreSQL + Redis + Weaviate
Observability
Metrics, logs, tracing
Kubernetes Cluster diagramStorage Layer diagramCI/CD Pipeline diagramObservability stack diagram
Infrastructure Stack
k3sKubernetesRaspberry Pi 5MinIOPostgreSQLRedisWeaviateGiteaActions RunnersPrometheusGrafana
Infrastructure Engineering

Building Private Cloud Infrastructure

Designing and operating a self-hosted Kubernetes cluster with distributed storage, networking, observability, CI/CD automation, and AI workloads on ARM64 nodes.

Core Principles

  • Everything runs as a containerized workload — no bare-metal service installs
  • Infrastructure is declarative (GitOps-first) — config is code, reviewed like code
  • Failure is expected and designed for — rescheduling, retries, and circuit breakers by default
  • Systems must be observable from day one — metrics, logs, and health checks before launch
  • Edge and local compute reduce cloud dependency, cost, and latency for internal workloads

Infrastructure Failure Modes

  • Node failure without pod rescheduling policies — workloads go dark silently
  • Storage saturation across persistent volumes causing write failures mid-operation
  • Network bottlenecks between services under burst load without pod-level limits
  • Silent service degradation with no alerting — issues surface hours late
  • Cluster resource fragmentation preventing scheduling of new pods despite available nodes

Infrastructure Layers

Compute Cluster

k3s on 8 Raspberry Pi 5 ARM64 nodes. Workload scheduling, resource quotas, and pod autoscaling.

k3sRaspberry Pi 5ARM64

Networking

Ingress NGINX for cluster-edge routing. Internal DNS, service mesh patterns, and cert management.

Ingress NGINXcert-managerDNS

Storage

MinIO S3-compatible object storage for assets and model weights. Persistent volumes for stateful workloads.

MinIOPVCS3-Compatible

CI/CD

Self-hosted Gitea with Actions runners. Automated build, test, and deploy pipelines triggered on push.

GiteaActionsGitOps

Observability Stack

Metrics

Prometheus scrapes node, container, and application metrics. Grafana dashboards surface CPU, memory, queue depth, and inference latency.

Logs

Centralized structured logging across all containers. Correlation IDs thread through API, worker, and AI service logs.

Tracing

Request tracing across distributed services. Health check endpoints polled by the load balancer per dependency.

System Architecture Diagram

Loading diagram…

Install @vue-flow/core to render this diagram.
Scheduling
App workloads
AI workloads
Storage
Ingress routing

Infrastructure Stack

Kubernetes (k3s)Raspberry Pi 5DockerARM64Ingress NGINXcert-managerMinIOGiteaActions RunnersPrometheusGrafanaAlertmanager

Frequently Asked Questions