Job Description
<p><p><b>Job Title :</b><br/><br/>Data Engineer at Egisedge Technologies Pvt Ltd is a highly skilled role that involves designing, developing, and maintaining scalable ETL/ELT data pipelines using Databricks (PySpark) on Azure/AWS/GCP.<br/><br/><b>Key Responsibilities :</b><br/><br/>- Design, develop, and maintain scalable ETL/ELT data pipelines using Databricks (PySpark) on Azure/AWS/GCP.<br/><br/>- Develop clean, reusable, and performant Python code for data ingestion, transformation, and quality checks.<br/><br/>- Write efficient and optimized SQL queries for querying structured and semi-structured data.<br/><br/>- Work with stakeholders to understand data requirements and implement end-to-end data workflows.<br/><br/>- Perform data profiling, validation, and ensure data quality and integrity.<br/><br/>- Optimize data pipelines for performance, reliability, and integrate data from various sources APIs, flat files, databases, cloud storage e.g, S3, ADLS.<br/><br/>- Build and maintain delta tables using Delta Lake format for ACID-compliant streaming and batch pipelines.<br/><br/>- Work with Databricks Workflows to orchestrate pipelines and scheduled jobs.<br/><br/>- Collaborate with DevOps and cloud teams to ensure secure, scalable, and compliant infrastructure.<br/><br/><b>Technical Skills Required :</b><br/><br/><b>Core Technologies :</b><br/><br/>- Databricks Spark on Databricks, Delta Lake, Unity Catalog<br/><br/>- Python with strong knowledge of PySpark<br/><br/>- SQL Advanced level joins, window functions, CTEs, aggregation<br/><br/><b>ETL & Orchestration :</b><br/><br/>- Databricks Workflows / Jobs<br/><br/>- Airflow, Azure Data Factory, or similar orchestration tools<br/><br/>- AUTO LOADER Structured Streaming preferred<br/><br/><b>Cloud Platforms Any one or more :</b><br/><br/>- Azure Databricks on Azure, ADLS, ADF, Synapse<br/><br/>- AWS Databricks on AWS, S3, Glue, Redshift<br/><br/>- GCP Dataproc, BigQuery, GCS<br/><br/><b>Data Modeling & Storage :</b><br/><br/>- Experience working with Delta Lake, Parquet, Avro<br/><br/>- Understanding of dimensional modeling, data lakes, and lakehouse architectures<br/><br/><b>Monitoring & Version Control :</b><br/><br/>- CI/CD pipelines for Databricks via Git, Azure DevOps, or Jenkins<br/><br/>- Logging, debugging, and monitoring with tools like Datadog, Prometheus, or Cloud-native tools<br/><br/><b>Optional/Preferred :</b><br/><br/>- Knowledge of MLflow, Feature Store, or MLOps workflows<br/><br/>- Experience with REST APIs, JSON, and data ingestion from 3rd-party services<br/><br/>- Familiarity with DBT Data Build Tool or Great Expectations for data quality<br/><br/><b>Soft Skills :</b><br/><br/>- Strong analytical, problem-solving, and debugging skills<br/><br/>- CLEAR communication and documentation skills<br/><br/>- Ability to work independently and within cross-functional teams<br/><br/>- Agile/Scrum working experience<br/></p><br/></p> (ref:hirist.tech)