Homelab Infrastructure | Russell Brenner

#The platform

The homelab is the substrate for all the other projects. Every legal AI tool, every agent session, every deployment runs here.

#Architecture

Two-node k3s cluster with zero-NodePort architecture: every service gets a stable VIP via MetalLB L2 load balancing. Stable service addresses are the difference between runbooks that work and runbooks that need updating after every node restart.

100+ services across the cluster: CI/CD, container registry, secret management, DNS, monitoring, and all the projects listed on this site.

#Seven-layer monitoring

Seven independent layers, each with exclusive scope. No tool probes the same endpoint as another tool. Overlapping probes fire duplicate alerts and dilute signal.

Layer	Tool	Scope
Infrastructure connectivity	Blackbox Exporter	VIP reachability, TLS cert validity
Service layer	Uptime Kuma	HTTP checks via external DNS
Network devices	LibreNMS	SNMP, LLDP, OPNsense, switches
Workload metrics	Prometheus + Grafana	Pod CPU/memory, custom app metrics
Node metrics	Netdata	Per-node real-time system metrics
Host security	Wazuh SIEM	Host intrusion detection, log analysis
Runtime security	Falco	Container escape detection, k8s audit

#Auto-remediation

The auto-remediation pipeline deliberately refuses to restart OOMKilled pods. An OOMKilled pod needs a memory limit change, not a restart. Restarting it is papering over a real problem.

The pipeline fixes what it can fix confidently, escalates after two failed attempts, and gets out of the way for everything else.

#Why it’s worth documenting

One person operating 100+ production services with seven-layer monitoring and automated remediation is the concrete answer to “what can AI-augmented operations actually do?”

This isn’t a demo. It’s the platform that runs everything on this site.

#The platform

#Architecture

#Seven-layer monitoring

#Auto-remediation

#Why it’s worth documenting

Share this post