Job Description
<p><p><b>Title :</b> Data Engineer</p><br/><p><b>About the Role :</b> </p><p><br/></p><p>We are seeking a highly skilled Data Engineer to join our team.
The ideal candidate will have expertise in Databricks, Python, and SQL.</p><br/><p><b>Key Responsibilities :</b></p><p><p><b><br/></b></p>- Design, develop, and maintain scalable ETL/ELT data pipelines using Databricks (PySpark) on Azure/AWS/GCP.<br/><br/></p><p>- Develop clean, reusable, and performant Python code for data ingestion, transformation, and quality checks.<br/><br/></p><p>- Write efficient and optimized SQL queries for querying structured and semi-structured data.<br/><br/></p><p>- Work with stakeholders to understand data requirements and implement end-to-end data workflows.<br/><br/></p><p>- Perform data profiling, validation, and ensure data quality and integrity.<br/><br/></p><p>- Optimize data pipelines for performance, reliability, and integrate data from various sources APIs, flat files, databases, cloud storage e.g., S3, ADLS.<br/><br/></p><p>- Build and maintain delta tables using Delta Lake format for ACID-compliant streaming and batch pipelines.<br/><br/></p><p>- Work with Databricks Workflows to orchestrate pipelines and scheduled jobs.<br/><br/></p><p>- Collaborate with DevOps and cloud teams to ensure secure, scalable, and compliant infrastructure.</p><br/><p><b>Technical Skills Required :</b></p><p><b><br/>Core Technologies :</b></p><p><p><b><br/></b></p>- Databricks Spark on Databricks, Delta Lake, Unity Catalog<br/><br/></p><p>- Python with strong knowledge of PySpark<br/><br/></p><p>- SQL Advanced level joins, window functions, CTEs, & Orchestration :</b></p><p><b><br/></b></p>- Databricks Workflows / Jobs<br/><br/></p><p>- Airflow, Azure Data Factory, or similar orchestration tools<br/><br/></p><p>- Auto Loader, Structured Streaming Platforms</b> Any one or more :</p><p><br/></p>- Azure Databricks on Azure, ADLS, ADF, Synapse<br/><br/></p><p>- AWS Databricks on AWS, S3, Glue, Redshift<br/><br/></p><p>- GCP Dataproc, BigQuery, GCS<br/><b><br/></b></p><p><p><b>Data Modeling & Storage :</b></p><p><b><br/></b></p>- Experience working with Delta Lake, Parquet, Avro<br/><br/></p><p>- Understanding of dimensional modeling, data lakes, and lakehouse & Version Control :</b></p><p><b><br/></b></p>- CI/CD pipelines for Databricks via Git, Azure DevOps, or Jenkins<br/><br/></p><p>- Logging, debugging, and monitoring with tools like Datadog, Prometheus, or Cloud-native tools</p><br/><p><b>Optional/Preferred :</b></p><p><p><b><br/></b></p>- Knowledge of MLflow, Feature Store, or MLOps workflows<br/><br/></p><p>- Experience with REST APIs, JSON, and data ingestion from 3rd-party services<br/><br/></p><p>- Familiarity with DBT Data Build Tool or Great Expectations for data quality</p><br/><p><b>Soft Skills :</b></p><p><p><b><br/></b></p>- Strong analytical, problem-solving, and debugging skills<br/><br/></p><p>- Clear communication and documentation skills<br/><br/></p><p>- Ability to work independently and within cross-functional teams<br/><br/></p><p>- Agile/Scrum working experience</p><br/></p> (ref:hirist.tech)