Job Description
<p><p><b>Job Description :</b></p><p><br/> We are looking for :<br/><br/> An experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure, automation, and observability.<br/><br/> A hands-on engineer who ensures reliability, performance, and scalability of systems.<br/><br/> A proactive problem solver with a strong focus on operational excellence and continuous improvement.<br/><br/> A collaborator who bridges development and operations through modern DevOps and SRE practices.<br/><br/> An effective communicator who thrives in cross-functional teams and drives best practices.<br/><br/> This role matters to us :<br/><br/> The Senior SRE & DevOps Engineer plays a vital role in ensuring the resilience, scalability, and reliability.<br/><br/> By applying modern SRE principles, automation, and incident management practices, you will enable faster, more reliable delivery of business value while safeguarding system stability and customer Responsibilities :</b></p><br/> - Design, implement, and maintain scalable, secure, and cloud-native infrastructure.<br/><br/></p><p> - Set up and maintain observability solutions, including monitoring, alerting, logging, and tracing (e.g., Prometheus, Grafana, ELK, DataDog).</p><p><br/></p><p> - Continuously improve CI/CD pipelines and automate deployment workflows to increase delivery efficiency.<br/><br/></p><p> - Lead structured incident response, root cause analysis, and drive a culture of post-mortem learning.<br/><br/></p><p> - Collaborate closely with developers, QA, and architects to ensure seamless integration and performance optimization.<br/><br/></p><p> - Apply SRE principles (SLIs, SLOs, SLAs, error budgets) to guide operational decisions and system reliability.<br/><br/></p><p> - Champion Infrastructure-as-Code practices using Terraform, Helm, or Ansible.<br/><br/></p><p> - Ensure security, compliance, and reliability are embedded into operations.<br/><br/></p><p> - Mentor team members and foster a culture of operational excellence and continuous :</b></p><br/><b>Education :</b><br/><br/>- Bachelors or Masters degree in Computer Science, Engineering, or equivalent practical experience.<br/><br/><b>Work Experience :</b><br/><br/> - Proven 6 to 8 yrs experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles.<br/><br/></p><p> - Hands-on expertise with Kubernetes (preferably GKE), Docker, and service mesh technologies like Istio.<br/><br/></p><p> - Strong background in CI/CD practices and tools (GitHub Actions, Jenkins X, ArgoCD, or similar).<br/><br/></p><p> - Experience with observability solutions (Prometheus, Grafana, ELK, Jaeger, DataDog, GCP Dashboards).<br/><br/></p><p> - Proficiency with at least one major cloud platform (GCP, AWS, Azure).<br/><br/></p><p> - Scripting or programming experience (Python, Go, Bash, or similar).<br/><br/></p><p> - Practical knowledge of Infrastructure-as-Code tools like Terraform, Helm, or Ansible.<br/><br/></p><p> - Hands-on experience managing incidents, troubleshooting, and performing root cause analysis.<br/><br/></p><p> - Familiarity with SRE practices (SLIs, SLOs, SLAs, error Requirements :</b></p><br/> - Strong communication and collaboration skills across cross-functional teams.<br/><br/></p><p> - Ability to balance short-term operational needs with long-term scalability and system health.<br/><br/></p><p> - Analytical and proactive mindset with focus on continuous improvement.<br/><br/></p><p> - Fluency in English (written and :</b></p><br/> - Experience with security best practices in distributed systems (OAuth2, mTLS, RBAC).<br/><br/></p><p> - Knowledge of cost optimization and cloud governance practices.<br/><br/></p><p> - Familiarity with Camunda/CIB7 environments.<br/><br/></p><p> - Contributions to open-source DevOps/SRE communities.</p><br/></p> (ref:hirist.tech)