Real Deployed Workloads

Introduction

This article documents real workloads currently running in my homelab environment. My setup is built around a Raspberry Pi-based K3s Kubernetes cluster integrated with a separate development machine running an RTX 3090 GPU inside Docker containers for AI workloads. Together, they form a hybrid system where lightweight services run on the cluster while compute-heavy AI tasks are offloaded to the GPU machine.

The goal of this environment is to simulate production-grade infrastructure at home—covering CI/CD pipelines, distributed services, AI inference workloads, and persistent storage systems. This setup also integrates Gitea for source control and self-hosted automation pipelines using Kubernetes runners.

Cluster Overview

My homelab rack is composed of:

A Raspberry Pi K3s cluster (control plane + worker nodes)
A dedicated development machine with RTX 3090 GPU (AI + model inference)
Docker-based service isolation for AI workloads
Gitea self-hosted Git platform
CI/CD runners deployed inside Kubernetes
External storage for persistent workloads and backups

This architecture allows me to separate concerns:

Kubernetes handles orchestration and service reliability
The GPU machine handles compute-heavy AI inference
CI/CD pipelines automate deployment from code push to cluster

Real Deployed Workloads

1. Gitea Self-Hosted Git Platform

I run a fully self-hosted Git service using Gitea inside my Kubernetes cluster.

Purpose:

Source control for all projects
Hosting private repositories for infrastructure, AI pipelines, and app development
Integration point for CI/CD workflows

Why it matters: This replaces GitHub for internal workflows and allows full control over deployment automation.

2. CI/CD Automation System (Gitea Runners)

I use Kubernetes-based runners connected to Gitea to automate deployments.

Flow:

Push code to repository
Trigger CI pipeline
Build Docker images
Deploy updated workloads to K3s cluster

Use cases:

FastAPI backend deployments
AI service updates
Infrastructure changes (Helm / YAML manifests)

This is essentially the glue between development and production inside my homelab.

3. FastAPI Microservices Layer

Several backend services run as containerized FastAPI applications.

Examples:

Authentication service
Bible app APIs (core logic layer)
AI routing service (decides when to call local GPU models vs API models)

Why FastAPI:

Lightweight and fast
Easy integration with Kubernetes
Works well with async workloads

4. AI Inference Layer (RTX 3090 Docker Node)

My GPU machine runs AI models in isolated Docker containers.

Capabilities:

Local LLM inference
Embedding generation for semantic search
Image generation pipelines (DALL·E-style or diffusion models)
Experimental LoRA fine-tuning workflows

Integration: Kubernetes services route requests to this machine when GPU compute is required.

This effectively turns the homelab into a small distributed AI system.

5. Vector Search + Retrieval Systems

I run retrieval systems for semantic search across theological and study content.

Stack includes:

Vector database (for embeddings)
Ingestion pipelines for Bible/Quran/text datasets
API layer for query retrieval

Use case: Powering AI-assisted Bible study features inside my application ecosystem.

6. Database Layer (PostgreSQL + Supporting Services)

A centralized database layer supports most applications.

Responsibilities:

User accounts and authentication data
Posts, debates, and community content
AI metadata and logs

This is deployed in a persistent Kubernetes volume setup with backup strategies tied to CI/CD workflows.

7. Media and Content Services

I also run supporting services for user-generated content:

Image uploads (stored on S3-compatible storage)
Video handling for livestream and recorded content
Ad creative storage system for in-app monetization

These services support the social and media features of the Bible app ecosystem.

8. Experimental Services

This is where I test new systems before production deployment:

WebRTC peer-to-peer debate system
Livestream routing through AWS IVS
Game marketplace backend logic
AI narration pipelines using ElevenLabs-style voice generation

These workloads often move into production once stabilized.

System Design Philosophy

This entire infrastructure is designed around three principles:

1. Separation of compute

Kubernetes = orchestration + lightweight services
GPU machine = AI inference + heavy compute

2. Self-host everything possible

Git (Gitea)
CI/CD runners
APIs and backend services
Data pipelines

3. Production simulation at home The system mirrors real-world cloud architecture:

microservices
container orchestration
CI/CD automation
distributed compute routing

Conclusion

This homelab is not just a collection of services—it’s a full production-style infrastructure environment running locally. The combination of a Raspberry Pi K3s cluster, a GPU-powered AI node, and self-hosted CI/CD pipelines creates a system capable of supporting real applications at scale.

As the system evolves, the focus is shifting toward tighter AI integration, improved workload routing, and more autonomous deployment pipelines. The end goal is a fully self-sustaining infrastructure where development, deployment, and AI inference all operate seamlessly across the cluster and GPU nodes.

Menu