AI-Augmented SRE Workflows
Built reusable Claude/Codex skill files at GoGuardian that automated vulnerability analysis and DDoS alert investigation — cutting 60+ minutes of manual security analysis to under 5 minutes per run.
Read moreSite Reliability Engineer
5+ years building resilient, secure, and cost-efficient infrastructure on AWS — Kubernetes, Terraform, CI/CD, and AI-augmented workflows.
Claude & Codex skill files that cut vulnerability analysis from 60 min to under 2 min. Reasoning where scripts can't.
SLO definition, burn-rate alerting, and on-call infrastructure built to surface risk before customer impact.
EKS & GKE cluster management, blue-green upgrades, Karpenter node autoscaling, and Helm-based deployments.
CloudFront + WAF edge defense, secrets centralization, and automated kernel patching across the EC2 fleet.
Modular Terraform with remote state, mandatory tagging, and drift detection — every resource is reproducible.
Designed and built GitLab CI/CD from scratch across 5+ microservices — multi-stage pipelines, environment-gated deployments, and 60% less manual intervention.
Rightsizing and resource cleanup based on utilization analysis — Karpenter over Cluster Autoscaler for 25% compute savings, combined with scheduled scaling and idle resource deletion.
Designed AWS VPC architecture with public/private subnet segmentation, NAT gateways, and security groups across dev, QA, and production environments.
Built reusable Claude/Codex skill files at GoGuardian that automated vulnerability analysis and DDoS alert investigation — cutting 60+ minutes of manual security analysis to under 5 minutes per run.
Read moreProduction-grade Indian investment portfolio tracker built with Go and React 19. Multi-broker import (Zerodha, Groww, INDMoney), Gmail auto-import, FIFO cost basis, XIRR, TimescaleDB time-series snapshots, and AI market analysis — deployed as a single Docker binary on a Raspberry Pi.
Read moreBlue-green EKS cluster migration from a manually-managed v1.23 cluster to a Terraform-provisioned v1.28 cluster with VPC-only access — achieving under 5 minutes of user-facing impact and 100% IaC coverage.
Read moreWe moved from writing Python scripts for repetitive SRE tasks to using Claude skill files — and it changed how we think about automation. Here's what actually works, what doesn't, and why the distinction matters.
Read moreA comprehensive guide on migrating Jenkins from EC2 to Kubernetes (EKS), covering the challenges, solutions, and best practices for a successful migration.
Read moreAfter a 90-minute outage taught us that blocking traffic at the origin is already too late, we redesigned our DDoS defense around CloudFront and WAF at the edge — and haven't had a successful attack in 2 years.
Read moreOpen to SRE, DevOps, and Platform Engineering roles. Always happy to connect.