Overview
This project documents the evolution and current architecture of my enterprise-grade homelab, which serves as the backbone for all my personal projects, development work, and home automation. What began over eight years ago as a single desktop running services in separate terminals has been systematically upgraded into a resilient, secure, and highly-available private cloud. The current iteration is built on bare-metal server hardware and orchestrated with Kubernetes.
The primary goal of this homelab is to provide a robust and scalable platform for hosting a wide array of services, from media servers and password managers to CI/CD pipelines and a private container registry. It functions as a practical learning environment for mastering cloud-native technologies, advanced networking, and system administration at scale. This ongoing project demonstrates a deep understanding of infrastructure design, automation, and security best practices, mirroring the environments found in modern tech companies.
{TODO: Add a photo of the server rack/homelab setup}
Key Features
- Kubernetes-as-a-Service Platform: A 9-node RKE2 Kubernetes cluster (3 control-plane, 6 worker) runs on Proxmox, providing a declarative and resilient environment for deploying and managing over 30 containerized applications.
- Advanced Network Segmentation: The network is managed by an OPNsense firewall on dedicated hardware, utilizing VLANs to isolate traffic between personal devices, IoT gadgets, server management (iDRAC), and the Kubernetes cluster, enforcing a strong security posture.
- Centralized Authentication & Management: Authentik provides Single Sign-On (SSO) for various services, centralizing user access control. The entire Kubernetes environment is managed through Rancher, offering a unified UI for cluster operations, monitoring, and application deployment.
- Comprehensive Self-Hosted Service Suite: The platform hosts a wide range of critical applications, including Harbor for a private container registry, Home Assistant for smart home automation, Immich for photo management, Nextcloud for file storage, and Vaultwarden for secure password management.
- Automated DNS and Certificate Management: Integrates ExternalDNS with Cloudflare to automatically manage public DNS records for services exposed via Ingress. TLS certificates are automatically provisioned and renewed using Cert-Manager and a custom Certbot job, ensuring all endpoints are secure.
Technologies & Implementation
The entire infrastructure is designed for resilience and performance, from the hardware up to the application layer. The foundation is a Dell PowerEdge R630 server with 128GB of RAM and 52 vCPUs, providing ample resources for virtualization and container orchestration.
- Hypervisor: Proxmox VE was chosen for its robust, open-source, and feature-rich virtualization management capabilities, allowing for the efficient creation and management of VMs for the Kubernetes nodes and other isolated workloads like a dedicated macOS VM.
- Orchestration: RKE2 (Rancher Kubernetes Engine 2) provides a secure and conformant Kubernetes distribution. A 9-node cluster topology ensures high availability for both the control plane and application workloads.
- Management: Rancher is deployed on top of the cluster to provide a comprehensive management plane, simplifying application deployment, monitoring (via integrated Prometheus and Grafana), and overall cluster administration.
- Networking: OPNsense and a managed switch form the core of the network, enabling granular control via VLANs and firewall rules. Inside Kubernetes, Calico is used as the CNI, and MetalLB provides load-balancing services for exposing applications.
- Storage: A Proxmox CSI plugin integrates the server’s local storage (2x 1TB SSDs and 5x 1TB HDDs) directly with Kubernetes, allowing for dynamic provisioning of Persistent Volumes for stateful applications like databases (PostgreSQL via CloudNativePG) and object storage (Minio).
graph TD
subgraph "Network Layer"
A["Internet (WAN)"] --> B["OPNsense Firewall (Mini PC)"]
B --> C["Managed Switch"]
C --> V1["VLAN 10: Personal Network (TP-Link Deco)"]
C --> V50["VLAN 50: IoT Network (TP-Link Deco)"]
C --> V100["VLAN 100: Server Management (iDRAC)"]
C --> V101["VLAN 101: Hypervisor (Proxmox)"]
C --> V102["VLAN 102: Kubernetes Nodes"]
end
subgraph "Infrastructure Layer"
D["Dell PowerEdge R630"] -- "Manages" --> V100
D -- "Hosts" --> E["Proxmox VE Hypervisor"]
E -- "Manages" --> V101
end
subgraph "Virtualization & Orchestration"
E --> F["9-Node RKE2 Kubernetes Cluster (VMs)"]
E --> G["macOS VM (BlueBubbles)"]
F -- "Nodes on" --> V102
end
subgraph "Application & Service Layer (Kubernetes)"
F --> H["Rancher Management"]
F --> I["Authentik SSO"]
F --> J["Harbor Registry"]
F --> K["Home Assistant, Immich, Jellyfin, etc."]
F --> L["Project Hosting (Websites, APIs)"]
end
Challenges & Solutions
One of the primary challenges was migrating from a rudimentary, single-node setup with manually managed services to a fully declarative, multi-node Kubernetes cluster without significant downtime for critical services. The process involved a phased approach: first containerizing all applications with Docker, then building the new Proxmox and Kubernetes infrastructure in parallel. I leveraged tools like Minio for S3-compatible backups to stage application data, and then performed a blue-green style cutover by updating DNS records to point to the new Kubernetes Ingress once the new environment was validated. This methodical approach ensured a smooth transition and preserved data integrity.
Another challenge was designing and implementing a secure network architecture from scratch. This required learning enterprise networking concepts and applying them in a home environment. The solution was to use OPNsense to create strict firewall rules and multiple VLANs to isolate untrusted networks (like IoT devices) from sensitive infrastructure. For example, the server’s iDRAC management interface is on its own VLAN, completely inaccessible from the IoT or personal networks, drastically reducing its attack surface. This segmentation ensures that a compromise on one part of the network cannot easily spread to critical systems.
Results & Impact
This project has resulted in a stable, secure, and powerful private cloud that underpins all of my digital activities. It provides a production-grade environment for hosting personal projects, websites, and over 30 essential services with near-perfect uptime. More importantly, it serves as an invaluable, hands-on platform for continuous learning and experimentation with the same technologies used to run large-scale systems in the industry.
The infrastructure has automated away countless hours of manual administration through its use of declarative configurations, GitOps principles via Rancher Fleet, and automated certificate renewals. It has provided a deep, practical understanding of cloud-native architecture, from bare-metal provisioning and network design to application lifecycle management in Kubernetes.
