Job Description
<p><p><b>Description : </b><br/><br/>Job Title : SRE / DevOps Engineer Cloud (AWS & Azure).<br/><br/>Location : Remote.<br/><br/>Department : Engineering / DevOps.<br/><br/>TE 9 Yrs+ [6+ years of experience relevant ].<br/><br/>Experience : 6+ years.<br/><br/><b>About the Role : </b><br/><br/>We are looking for a skilled and experienced SRE/DevOps Engineer to join our cloud infrastructure team.<br/><br/>The ideal candidate will have deep expertise in AWS and Azure cloud platforms, strong networking knowledge, and hands-on experience with modern CI/CD pipelines, Git workflows, and GitOps tools.<br/><br/>You will play a key role in designing, implementing, and maintaining reliable, scalable, and observable infrastructure to support our development and operations teams.<br/><br/><b>Key Responsibilities : </b><br/><br/>- Design, deploy, and manage scalable cloud infrastructure on AWS and Azure, focusing on compute, networking, and storage components.<br/><br/>- Implement and maintain networking solutions such as VPCs, VNets, subnets, routing, VPNs, security groups, network security, and load balancers across AWS and Azure.<br/><br/>- Develop and manage CI/CD pipelines using GitHub Actions, integrating automated build, test, and deployment workflows.<br/><br/>- Implement Git workflows to enable smooth collaboration, including branching strategies, pull requests, and code reviews.<br/><br/>- Manage Kubernetes deployments using Argo CD for GitOps-based continuous deployment on AWS EKS and Azure AKS.<br/><br/>- Build and maintain observability and monitoring solutions using Datadog, Prometheus, and Fluent Bit to ensure system health, performance, and availability.<br/><br/>- Collaborate closely with software engineering, infrastructure, and security teams to ensure reliability, scalability, and security best practices.<br/><br/>- Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform preferred).<br/><br/>- Troubleshoot incidents and provide root cause analysis to improve system resilience and reduce downtime.<br/><br/>- Optimize cloud resource utilization and cost management on AWS and Azure.<br/><br/><b>Qualifications : </b><br/><br/>Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.<br/><br/><b>Required Skills : </b><br/><br/>- 6+ years of experience as an SRE or DevOps Engineer working with AWS and Azure cloud environments.<br/><br/>- Build and maintain observability and monitoring solutions using Datadog, Prometheus, App Dynamics and other market leading tools to ensure system health, performance, and availability.<br/><br/>- Strong hands-on experience with AWS components : VPC, EC2, ELB, Route 53, IAM, CloudWatch, Lambda, S3, RDS, EKS.<br/><br/>- Strong hands-on experience with Azure components : Virtual Network (VNet), Azure Compute (VMs, App Services), Azure Load Balancer, Azure DNS, Azure AD, Azure Monitor, AKS.<br/><br/>- Deep networking knowledge including TCP/IP, DNS, VPN, Load Balancers, Security Groups, NSGs, routing, and firewall configurations.<br/><br/>- Expertise in CI/CD pipeline creation and management using GitHub Actions.<br/><br/>- Proven experience with Git workflows including feature branching, pull requests, merges, and conflict resolution.<br/><br/>- Experience with Kubernetes and GitOps deployment using Argo CD.<br/><br/>- Proficient with monitoring and observability tools : Datadog, Prometheus, Fluent Bit, and Grafana.<br/><br/>- Skilled in scripting and automation (Python, Bash, PowerShell).<br/><br/>- Experience with Infrastructure as Code tools such as Terraform or ARM templates.<br/><br/>- Experience with container orchestration beyond EKS and AKS.<br/><br/>- Strong problem-solving skills and experience in incident management and root cause analysis.<br/><br/>- Excellent communication skills and ability to collaborate across multiple teams.<br/><br/><b>Preferred Qualifications : </b><br/><br/>- Certifications such as AWS Certified Solutions Architect, Azure Administrator Associate, or Certified Kubernetes Administrator (CKA).<br/><br/>- Knowledge of security best practices for cloud-native environments.<br/><br/>- Familiarity with additional DevOps tools such as Jenkins, Ansible, or Helm.<br/><br/>- Any prior knowledge about mainframe system.</p><br/></p> (ref:hirist.tech)