Responsibilities:
- Data Pipeline Design & Build: Design and build robust data pipelines using Spark-SQL and PySpark in Azure Databricks.
- ETL Pipeline Development: Design and build efficient ETL pipelines using Azure Data Factory (ADF).
- Lakehouse Architecture: Build and maintain a Lakehouse architecture in ADLS (Azure Data Lake Storage) / Databricks.
- Data Preparation: Perform comprehensive data preparation tasks, including data cleaning, normalization, deduplication, and type conversion.
- Production Deployment: Collaborate with the DevOps team to deploy data solutions into production environments.
- Data Process Control & Correction: Control data processes and take immediate corrective action when errors are identified, including executing workarounds and identifying root causes and solutions for data errors.
- Team Collaboration: Participate as a full member of the global Analytics team, providing solutions and insights into data-related items.
- Knowledge Sharing: Collaborate with Data Science and Business Intelligence colleagues globally to share key learnings, leverage ideas, solutions, and propagate best practices.
- Project Leadership: Lead projects that include other team members and actively participate in projects led by other team members.
Required Skills:
- Proficiency in PySpark.
- Expertise in Azure cloud platform.
- Strong knowledge of Azure Data Factory (ADF).
- Hands-on experience with Databricks.
- Proficiency in ETL processes.
- Strong SQL skills.
Good to Have Skills:
- Familiarity with Change Management tools.
- Knowledge of DevOps practices.
Skills Required
Pyspark, Adf, Databricks, Sql, Devops