Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Zycus Site Reliability Engineering Manager.
India Jobs Expertini

Urgent! Zycus - Site Reliability Engineering Manager Job Opening In Mumbai – Now Hiring Zycus Infotech Pvt Ltd

Zycus Site Reliability Engineering Manager



Job description

<p><p><b>Job Description : </b><br/><br/>Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes, automation, and Linux systems.<br/><br/>The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture, ensuring automation, performance, and reliability across our SaaS platform.<br/><br/><b>Roles And Responsibilities : </b><br/><br/>- System Reliability & Uptime : Ensure high availability, performance, and reliability of applications and infrastructure.<br/><br/>- Kubernetes & Cluster Management : Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.<br/><br/>- Microservices Management : Handle the deployment, monitoring, and scaling of microservices in distributed environments.<br/><br/>- Incident Management : Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.<br/><br/>- Automation & Infrastructure as Code (IaC) : Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.<br/><br/>- Monitoring & Observability : Implement and maintain monitoring tools (e.<br/><br/>, Prometheus, Grafana, Datadog) to track system health and application performance.<br/><br/>- Performance Optimization : Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.<br/><br/>- Disaster Recovery & Backup : Design and implement backup and disaster recovery (DR) strategies for business continuity.<br/><br/>- Capacity Planning : Forecast infrastructure needs based on performance trends and business growth to ensure scalability.<br/><br/>- Security & Compliance : Ensure infrastructure and applications meet security standards and compliance requirements.<br/><br/>- Collaboration with Dev & Ops Teams : Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.<br/><br/>- Documentation : Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.<br/><br/>- Continuous Improvement : Identify opportunities for improving system architecture, deployment strategies, and automation workflows.<br/><br/>- Cloud Infrastructure Management : Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.<br/><br/>- On-Call Support : Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.<br/><br/><b>Job Requirement : </b><br/><br/>Experience : 5 to 12 years.<br/><br/>Technical skills as mentioned below : .<br/><br/><b>Must Have : </b><br/><br/><b>Kubernetes Expertise : </b><br/><br/>- Hands-on experience with installing and provisioning Kubernetes clusters.<br/><br/>- Deep understanding of core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy.<br/><br/>- Strong knowledge of Kubernetes internal networking, service discovery, and ingress management.<br/><br/><b>Kubernetes Distributions : </b><br/><br/>- Hands-on experience with different Kubernetes provisioners and distributions.<br/><br/><b>Kubernetes Cluster Administration : </b><br/><br/>- Experience in administering production Kubernetes clusters, including backup and disaster recovery (DR) strategies.<br/><br/>- Familiarity with cluster health monitoring and troubleshooting issues.<br/><br/><b>Monitoring tools : </b> Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics.<br/><br/><b>Automation & Scripting : </b><br/><br/>- Strong programming skills in Python or Shell, or similar languages.<br/><br/>- Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible.<br/><br/>- Cloud automation experience, ideally with AWS or other major cloud platforms.<br/><br/><b>Operating Systems : </b> Hands-on experience with Linux system : </b> Experience with microservices architecture and managing more than 50 microservices simultaneously.<br/><br/><b>Good To Have Skills : </b><br/><br/>- Experience with OpenShift virtualization in production environments.<br/><br/>- Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.<br/><br/>- CKA (Certified Kubernetes Administrator) certification or equivalent.<br/><br/>- Experience in fine-tuning RHEL, CentOS, and Ubuntu.<br/><br/>- Familiarity with DevSecOps practices, container security, and compliance frameworks.<br/></p><br/></p> (ref:hirist.tech)


Required Skill Profession

Computer Occupations



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Zycus Site Potential: Insight & Career Growth Guide