Overview As a senior DevOps Engineer, you will own the AWS infrastructure and DevOps toolchain for a high-scale ad serving system composed of asynchronous Java microservices (Akka framework) .
Targets include <50ms response time and up to 5M concurrent users with 99.99% uptime .
Responsibilities Design & stand up AWS environments end-to-end (landing zone, VPCs, networking, security, automation).
Build immutable infrastructure and CI/CD for Java microservices (Maven/Gradle) including blue/green & canary releases and automated rollbacks.
Implement observability : metrics, logs, traces, SLOs/SLIs, alerting, on-call runbooks.
Engineer reliability & performance : autoscaling, caching layers, multi-AZ/region DR, capacity planning to support 5M+ concurrent users and p95/p99 latency goals.
Establish security-by-design : IAM least privilege, KMS/Secrets Manager, WAF/Shield, image/signing policies, CIS benchmarks.
Partner with EY developers & Performance Test Engineer to tune JVM/Akka, thread pools, GC, and infra limits based on load-testing feedback.
Champion cost governance and tagging; produce dashboards and weekly reports.
Tech you’ll use (you don’t need every single one, but you know most) AWS : EKS/ECS, EC2, ALB/NLB, API Gateway/Lambda, S3/CloudFront, DynamoDB/ElastiCache (Redis), Aurora/RDS, MSK/Kinesis, OpenSearch, Route 53, VPC, NAT/GW, WAF/Shield, CloudWatch/X-Ray, IAM, KMS, Secrets Manager.
IaC & CI/CD : Terraform/CloudFormation, Helm, Argo CD or Flux, GitHub Actions/Jenkins/GitLab CI, Docker.
Observability : CloudWatch, OpenTelemetry, Prometheus/Grafana, log pipelines.
Languages/Build : Bash/Python for automation; familiarity with Java build/release workflows.
What makes you a great fit 3–5+ years total experience; Senior/Manager-level depth in AWS platform engineering for high-throughput, low-latency services.
Proven ownership of production systems at 10k–1M+ concurrent users (or comparable high RPS) with 99.9x SLOs.
Hands-on with Akka/Java microservice delivery pipelines (nice if you’ve tuned JVM, GC, Akka dispatchers).
Strong grounding in scaling patterns (event-driven, async IO, caching, backpressure, rate limiting) and resilience (circuit breakers, retries, chaos).
Excellent collaboration, documentation, and stakeholder communication.
Logistics Location : Remote (prefer India candidates) Schedule : Must join US morning calls (Eastern Time) as needed.
Start : 1–3 weeks from offer.
Term : Through end of January (likely extension).