Projects
Global Traffic Engine
AWS · ROUTE53 · TERRAFORM
Multi-region traffic management with Route53, Global Accelerator, custom health checks. 99.99% availability.
Autonomous Remediation
AWS · LAMBDA · PYTHON
Self-healing infrastructure with EventBridge + Lambda. Automatic incident resolution for common failure patterns.
Self-Hosted GPU Inference
CUDA · LLAMA.CPP · LITELLM · DOCKER
llama.cpp on CUDA with multi-model tier scheduling, LiteLLM proxy routing, Prometheus/Grafana observability.
CI/CD Pipeline Framework
JENKINS · GITHUB_ACTIONS · TERRAFORM
Standardized Jenkins/GitHub Actions pipelines with automated rollback, blue-green deploys, security scanning.
Incident Response Automation
PYTHON · AWS_SSM · PAGERDUTY
IRP framework that cut MTTR from 90 to 54 minutes (40%). Automated runbooks, on-call escalation.
Observability Stack
PROMETHEUS · GRAFANA · LOKI · TEMPO
Prometheus, Grafana, Loki, Tempo. Full-stack observability for multi-cluster Kubernetes.