ROLE RESPONSIBILITIES
Data Engineering and Processing:
• Develop and manage data pipelines using PySpark on Databricks.
• Implement ETL/ELT processes to process structured and unstructured data at scale.
• Optimize data pipelines for performance, scalability, and cost-efficiency in Databricks.
Databricks Platform Expertise:
• Experience in Perform Design, Development & Deployment using Azure Services (Data Factory,
Databricks, PySpark, SQL)
• Develop and maintain scalable data pipelines and build new Data Source integrations to support
increasing data volume and complexity.
• Leverage the Databricks Lakehouse architecture for advanced analytics and machine learning
workflows.
• Manage Delta Lake for ACID transactions and data versioning.
• Develop notebooks and workflows for end-to-end data solutions.
Cloud Platforms and Deployment:
• Deploy and manage Databricks on Azure (e.g., Azure Databricks).
• Use Databricks Jobs, Clusters, and Workflows to orchestrate data pipelines.
• Optimize resource utilization and troubleshoot performance issues on the Databricks platform.
CI/CD and Testing:
• Build and maintain CI/CD pipelines for Databricks workflows using tools like Azure
DevOps, GitHub Actions, or Jenkins.
• Write unit and integration tests for PySpark code using frameworks like Pytest or unittest.
Collaboration and Documentation:
• Work closely with data scientists, data analysts, and IT teams to deliver robust data solutions.
• Document Databricks workflows, configurations, and best practices for internal use.
TECHNICAL QUALIFICATIONS
Experience:
• 7+ years of experience in data engineering or distributed systems development.
• Strong programming skills in Python and PySpark.(7+ years)
• Hands-on experience with Databricks and its ecosystem, including Delta Lake and Databricks
SQL.
• Knowledge of big data frameworks like Hadoop, Spark, and Kafka.
Databricks Expertise:
• Proficiency in setting up and managing Databricks Workspaces, Clusters, and Jobs.
• Familiarity with Databricks MLflow for machine learning workflows is a plus.
Cloud Platforms:
• Expertise in deploying Databricks solutions Azure (e.g., Data Lake, Synapse).
• Knowledge of Kubernetes for managing containerized workloads is advantageous.
Database Knowledge:
• Experience with both SQL (e.g., PostgreSQL, SQL Server) and NoSQL databases
(e.g., MongoDB, Cosmos DB).
GENERAL QUALIFICATIONS
• Strong analytical and problem-solving skills.
• Ability to manage multiple tasks in a high-intensity, deadline-driven environment.
• Excellent communication and organizational skills.
• Experience in regulated industries like insurance is a plus.
EDUCATION REQUIREMENTS
• A Bachelor's Degree in Computer Science, Data Engineering, or a related field is preferred.
• Relevant certifications in Databricks, PySpark, or cloud platforms are highly desirable.