Description   
     Join our organization as a Lead Systems Engineer (DevOps & SRE) and play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications.
  The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies, and will lead the design, development, and maintenance of scalable and reliable infrastructure.
  You will also be responsible for implementing and managing CI/CD pipelines, monitoring system performance and reliability, developing and maintaining automation tools, ensuring security and compliance, mentoring and guiding junior SREs and DevOps engineers, and staying up-to-date with the latest industry trends and technologies.
  #LI-DNI    
  Technologies   
     CI/CD, Jenkins, Docker, Kubernetes, Terraform, Ansible, Python, Prometheus, Grafana, ELK stack, Splunk, Dynatrace, Datadog or similar, SLI, SLO, SLA and Error Budget concepts    
  Responsibilities   
     Lead the design, development, and maintenance of scalable and reliable infrastructure    Implement and manage CI/CD pipelines to ensure efficient and smooth software releases    Monitor system performance and reliability, proactively identifying and resolving issues    Develop and maintain automation tools to streamline infrastructure management and deployment processes    Collaborate with development teams to ensure best practices for software development, deployment, and operations    Ensure security and compliance across all infrastructure and operations    Mentor and guide junior SREs and DevOps engineers, fostering a culture of collaboration and continuous learning    Conduct root cause analysis of system failures and implement solutions to prevent recurrence    Optimize resource utilization to ensure cost-effective operations    Stay up-to-date with the latest industry trends and technologies, integrating them into our processes where appropriate    
  Requirements   
     8+ years of experience in a DevOps/SRE role    Strong experience with cloud platforms (AWS, GCP, Azure)    Proficiency in infrastructure as code (IaC) tools (Terraform, CloudFormation, etc.)    Extensive experience with containerization and orchestration (Docker, Kubernetes)    Strong knowledge of CI/CD tools (Jenkins, GitLab CI, CircleCI, etc.)    Proficiency in scripting languages (Python, Bash, etc.)    Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, etc.)    Ability to participate in capacity planning and scalability assessments to support business growth and requirements    Well aware of SLI, SLO, SLA and Error Budget concepts and their implementations and provide on-call support and participate in incident management & response activities as needed    Solid understanding of networking and security principles    Excellent problem-solving skills and the ability to work under pressure    Strong communication and collaboration skills    B2+ English level proficiency    
  We offer   
     Opportunity to work on technical challenges that may impact across geographies    Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications    Opportunity to share your ideas on international platforms    Sponsored Tech Talks & Hackathons    Unlimited access to LinkedIn learning solutions    Possibility to relocate to any EPAM office for short and long-term projects    Focused individual development    Benefit package: Health benefits Retirement benefits Paid time off Flexible benefits     Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)