Projects

Global Traffic Engine

AWS · ROUTE53 · TERRAFORM
Multi-region traffic management with Route53, Global Accelerator, custom health checks. 99.99% availability.

Autonomous Remediation

AWS · LAMBDA · PYTHON
Self-healing infrastructure with EventBridge + Lambda. Automatic incident resolution for common failure patterns.

Self-Hosted GPU Inference

CUDA · LLAMA.CPP · LITELLM · DOCKER
llama.cpp on CUDA with multi-model tier scheduling, LiteLLM proxy routing, Prometheus/Grafana observability.

CI/CD Pipeline Framework

JENKINS · GITHUB_ACTIONS · TERRAFORM
Standardized Jenkins/GitHub Actions pipelines with automated rollback, blue-green deploys, security scanning.

Incident Response Automation

PYTHON · AWS_SSM · PAGERDUTY
IRP framework that cut MTTR from 90 to 54 minutes (40%). Automated runbooks, on-call escalation.

Observability Stack

PROMETHEUS · GRAFANA · LOKI · TEMPO
Prometheus, Grafana, Loki, Tempo. Full-stack observability for multi-cluster Kubernetes.