Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice 
Location: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.
 
Role Summary:  
Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?
 
We're looking for an SRE with 7+ years of experience , including 4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join our Platform Engineering Practice .
In this role, you’ll design, manage, and scale ELK clusters ingesting 2–3+ TB/day , enhance reliability across distributed systems, and drive automation within Azure cloud environments.
This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.
Why Join Us - Career Growth:  Work alongside industry experts on cutting-edge cloud technologies 
- Competitive Compensation and Benefits:  We recognize and reward top talent 
- Exciting, Impactful Work:  Design and build scalable, resilient cloud environments 
- Strategic Platform Role:  Contribute to the foundation of next-gen observability and reliability infrastructure
 
What You Will Do- Design and Optimize Cloud Infrastructure:  Architect scalable, fault-tolerant systems on Microsoft Azure 
- Automate Everything:  Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration 
- Ensure Reliability and Performance:  Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor 
- Enhance Security and Compliance:  Implement security best practices across DevOps workflows 
- Collaborate and Innovate:  Work closely with engineering, security, and operations teams to drive automation and efficiency 
- Manage and scale large ELK clusters  handling 2–3+ TB/day  log volumes, ensuring high availability and performance 
- Optimize ELK architecture:  Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage 
- Build and tune log pipelines:  Scale Logstash and Beats pipelines across distributed environments 
- Support Kibana observability layers:  Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)
 
What You Bring- 7+ years of experience  in Site Reliability Engineering, DevOps, or Cloud Engineering 
- 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)  
- Strong experience managing large-scale ELK clusters in production  with heavy ingestion (multi-TB/day) 
- Deep knowledge of index tuning, shard allocation, ILM policies , and scaling ELK components 
- Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC) 
- Proficiency in Python, Go, or Bash  for automation and scripting 
- Deep understanding of Kubernetes, Docker , and cloud-native architectures 
- Experience with observability tools  such as Prometheus, Grafana, Azure Monitor 
- Ability to work in a fast-paced, collaborative environment and solve complex operational issues
 
Education- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field
 
Certifications (Nice to Have)- Microsoft Azure certifications: AZ-104 , AZ-400