Job Description
<p><p><b>Job Description : </b></p><p><p><b><br/></b></p><p><b>Role : DevOps Engineer</b></p><br/>We are seeking a highly skilled DevOps Engineer to join our dynamic team.<br/><br/>The ideal candidate will have strong expertise in automation, cloud infrastructure (with a focus on AWS and GenAI services), CI/CD, and containerization, along with a deep understanding of security best practices, monitoring, and system optimization.<br/><br/>This role requires a balance of technical proficiency, problem-solving, and collaboration skills to ensure smooth deployment, scalability, and reliability of applications and infrastructure.<br/><br/><b>Key Responsibilities : </b></p><p><br/></p><p>- Design, automate, and manage scalable, secure, and high-availability cloud infrastructure.<br/><br/>- Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.<br/><br/>- Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, CircleCI, or AWS CodePipeline.<br/><br/>- Automate routine tasks using Python and shell scripting.<br/><br/>- Monitor and optimize system performance using Prometheus, Grafana, ELK stack, or AWS CloudWatch.<br/><br/>- Manage databases (MySQL, PostgreSQL, MongoDB, DynamoDB), including backup, recovery, and performance tuning.<br/><br/>- Deploy and manage web applications on production environments with Nginx, Apache, or similar servers.<br/><br/>- Ensure cloud, networking, and server security using IAM, VPC, security groups, and firewalls.<br/><br/>- Manage source control and team collaboration using Git and branching strategies.<br/><br/>- Work with containerization and orchestration technologies (Docker, Kubernetes, ECS).<br/><br/>- Implement disaster recovery, backup, and high-availability strategies.<br/><br/>- Troubleshoot incidents, perform root cause analysis, and implement preventive measures.<br/><br/>- Collaborate with cross-functional teams, ensuring effective communication and documentation.<br/><br/><b>Required Skills & Experience : & Infrastructure as Code (IaC) : </b></p><p><br/></p><p>- Hands-on experience with Terraform, AWS CloudFormation, or similar.</p><p><br/><p>- Proficient in automating infrastructure deployment and management tasks.</p><p><br/>- Knowledge of configuration management tools (Ansible, Chef, Puppet).</p></p><br/><b>Monitoring & Logging : </b></p><p><b><br/></b></p><p>- Experience with monitoring tools (Prometheus, Grafana, ELK Stack, AWS CloudWatch).<br/><br/>- Ability to set up alerts, dashboards, and audit logs for system health and performance.<br/><br/><b>Cloud Platforms (AWS Must Have GenAI Services Experience) : </b></p><p><br/></p><p>- Strong knowledge of AWS services : EC2, S3, RDS, Lambda, Bedrock, OpenSearch, Knowledgebase, IAM, VPC, CodeDeploy, CodePipeline, SQS, etc.<br/><br/>- Familiar with cloud-native architectures and multi-cloud environments (a plus).<br/><br/><b>Scripting & Automation : </b></p><p><br/></p><p>- Python : Scripting, automation, Boto3 for AWS, Flask/Django familiarity (bonus).<br/><br/>- Shell scripting : Strong skills in bash or similar for deployment and system automation.<br/><br/><b>Database Management : </b></p><p><br/></p><p>- Experience with MySQL, PostgreSQL, MongoDB, DynamoDB.<br/><br/>- Backup, recovery, performance tuning, and database security best practices.<br/><br/><b>Web Application Deployment & Server Management</b><br/><br/>- Experience with production deployments, web/application servers (Nginx, Apache).<br/><br/>- Knowledge of reverse proxies, SSL/TLS setup, and security hardening.<br/><br/><b>Security & Networking :</b></p><p><br/></p><p>- Cloud security best practices, IAM management, firewalls, and VPC configurations.<br/><br/>- Strong understanding of TCP/IP, DNS, HTTP/HTTPS, and load balancer setups.<br/><br/><b>CI/CD & Version Control : </b></p><p><br/></p><p>- Proficient in Git workflows (GitFlow, trunk-based) for multi-team management.<br/><br/>- Experience with CI/CD pipelines (Jenkins, GitLab CI, AWS Code Pipeline).<br/><br/>- Knowledge of containerization (Docker) and orchestration (Kubernetes, ECS).<br/><br/><b>High Availability & Scaling :</b></p><p><br/></p><p>- Load balancing strategies (AWS ELB, HAProxy), failover planning.<br/><br/>- Auto-scaling in cloud platforms and performance optimization.<br/><br/><b>Backup, Recovery & Incident Response :</b></p><p><br/></p><p>- Implementation of disaster recovery, redundancy strategies, and system resilience.<br/><br/>- Troubleshooting, root cause analysis, and preventive measures.<br/><br/><b>Collaboration & Project Management : </b></p><p><br/></p><p>- Strong communication and documentation skills.<br/><br/>- Ability to collaborate across teams and explain technical concepts to non-technical stakeholders.<br/><br/>- Familiarity with Agile methodologies (Scrum, Kanban) and tools (Jira, Trello) is a plus.<br/><br/><b>Preferred/Optional Skills :</b></p><p><br/></p><p>- Dockerfile and Docker Compose creation for multi-container applications.<br/><br/>- Serverless architecture with AWS Lambda, SQS, SNS.<br/><br/>- Project management and task prioritization.</p><br/></p> (ref:hirist.tech)