Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Cloud Infrastructure Automation Engineer.
India Jobs Expertini

Urgent! Cloud Infrastructure Automation Engineer Job Opening In Hyderabad – Now Hiring Sonata Software

Cloud Infrastructure Automation Engineer



Job description

Role:Site Reliability Engineer

Location:Hyderabad

Notice Period: Immediate to 20 Days

Employment Type:Full Time

Experience

  • 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)
  • Primary Skills (Must-Have)
  • AWS, CI/CD, Jenkins, IAAC, Terraform, Kubernetes
  • Secondary Skills (Good-to-Have)
  • AWS systems;
    Dataiku data, Platform updates and patching
  • Tools & Platforms
  • Data Warehousing & Processing: Snowflake, Redshift, Apache Airflow, dbt
  • CI/CD & Deployment: Jenkins, GitHub Actions, AWS CodePipeline, Terraform
  • Cloud & Event Processing: AWS Lambda, API Gateway, SNS/SQS, Kafka, Step Functions
  • Monitoring & Logging: DataDog, AWS CloudWatch, Prometheus, Splunk
  • Incident Management: PagerDuty, Opsgenie, AWS Health Dashboard
  • Collaboration & Code Review: GitHub, Jira, Confluence

Key Responsibilities

Data Pipeline Reliability & Observability:

- Maintain and optimize highly available, fault-tolerant infrastructure for data pipelines, ETL jobs, and real-time data processing

- Implement end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows

- Automate data pipeline health checks, error handling, and auto-remediation strategies


Infrastructure & Cloud Automation:

- Deploy and manage AWS-based data infrastructure using Terraform and CloudFormation

- Optimize Kubernetes (EKS) clusters for processing large-scale datasets and real-time analytics

- Ensure high availability and cost-efficient scaling for Redshift, Snowflake, and data storage solutions


Performance, Monitoring & Incident Response:

- Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and Prometheus

- Define and track SLOs, SLIs, and error budgets to improve data reliability and uptime

- Conduct Root Cause Analysis (RCA), security audits, and post-mortems for incidents


Security & Compliance:

- Ensure GDPR, CCPA, and SOC 2 compliance for data storage, access controls, and retention policies

- Implement AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption

- Secure API gateways, authentication mechanisms, and data lake permissions to prevent unauthorized access


Collaboration & Leadership:

- Work closely with data engineers, analytics teams, and DevOps engineers to enhance data platform reliability

- Participate in incident response drills, disaster recovery planning, and security compliance reviews

- Advocate for best practices in automation, cost optimization, and cloud-native data solutions


Required Skill Profession

Computer Occupations



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Cloud Infrastructure Potential: Insight & Career Growth Guide