Job Description
<p><p><b>About the Role :</b><br/><br/>We are seeking a highly skilled Senior Data Engineer to design, build, and optimize scalable data pipelines and platforms that power analytics, reporting, and machine learning use cases.
</p><p><br/></p><p>This role requires strong technical expertise in distributed data systems, cloud-native architectures, and automation frameworks.
</p><p><br/></p><p>The ideal candidate is passionate about building reliable, high-performance data infrastructure and enabling data-driven decision-making across the enterprise.<br/><br/><b>Key Responsibilities :</b><br/><br/>- Data Pipeline Development : Design, develop, and maintain scalable ETL/ELT pipelines for ingesting, transforming, and processing large volumes of structured and unstructured data.<br/><br/>- Platform Engineering : Enhance data-processing frameworks, orchestration workflows, monitoring systems, and CI/CD pipelines leveraging AWS, GitLab, and open-source technologies.<br/><br/>- Optimization & Automation : Identify opportunities to automate manual processes, optimize workflows for efficiency, and re-architect solutions for improved scalability, availability, and usability.<br/><br/>- Collaboration : Partner with product managers, data scientists, and application teams to understand requirements, define data models, and ensure reliable data delivery for analytical use cases.<br/><br/>- Platform Support : Provide guidance, training, and technical support to internal stakeholders consuming platform services.<br/><br/>- Monitoring & Reliability : Establish metrics, implement monitoring tools, and configure alerting mechanisms to proactively track system health, detect anomalies, and ensure SLA adherence.<br/><br/>- Best Practices : Enforce coding standards, data governance policies, and DevOps practices for secure and compliant data solutions.<br/><br/><b>Qualifications & Technical Skills</b><br/><br/><b>Core Expertise :</b><br/><br/>- Proven experience in building and optimizing data pipelines in distributed environments.<br/><br/>- Strong programming expertise in Python and PySpark (4+ years).<br/><br/>- Advanced proficiency in SQL for querying, modeling, and tuning performance.<br/><br/>- Hands-on experience with Linux environments and shell scripting.<br/><br/><b>Cloud & Tools :</b><br/><br/>- Experience with AWS services such as S3, Glue, EMR, Redshift, Lambda, and Athena.<br/><br/>- Familiarity with CI/CD and version control tools : Git/Bitbucket, Jenkins, AWS CodeBuild, CodePipeline.<br/><br/>- Exposure to monitoring and alerting tools (e.g., CloudWatch, Prometheus, Grafana, ELK).<br/><br/><b>Additional Skills :</b><br/><br/>- Working knowledge of Palantir platform is a strong plus.<br/><br/>- Experience collaborating with cross-functional teams (data scientists, analysts, DevOps, application developers).<br/><br/>- Strong problem-solving and analytical skills with ability to debug complex data issues.<br/><br/>- Knowledge of distributed computing concepts, data partitioning, and performance tuning.<br/><br/><b>Preferred Attributes :</b><br/><br/>- Experience with modern data lakehouse and streaming platforms (Databricks, Kafka, Delta Lake).<br/><br/>- Understanding of data security, governance, and compliance in regulated environments.<br/><br/>- Ability to design highly available, cost-optimized, and production-ready solutions.<br/><br/>- Strong communication skills to engage with stakeholders and present technical solutions clearly.</p><br/></p> (ref:hirist.tech)