Description GSPANN is hiring a Senior Data Engineer to design, develop, and optimize scalable data solutions.
The role requires expertise in Azure Data Factory, Azure Databricks, PySpark, Delta Tables, and advanced data modeling, along with skills in performance optimization, API integrations, DevOps, and data governance.
Role and Responsibilities
Design, develop, and orchestrate scalable data pipelines using Azure Data Factory (ADF).Build and manage Apache Spark clusters, create notebooks, and run jobs in Azure Databricks.Ingest, organize, and transform data within the Microsoft Fabric ecosystem using OneLake.Author complex transformations and write SQL (Structured Query Language) queries for large-scale data processing using PySpark and Spark SQL.Create, optimize, and maintain Delta Lake tables, applying operations such as VACUUM, ZORDER, and OPTIMIZE.Parse, validate, and transform semi-structured JSON (JavaScript Object Notation) datasets.Build and consume REST/OData services for custom data ingestion through API (Application Programming Interface) integration.Implement bronze, silver, and gold layers in data lakes using the Medallion Architecture to ensure clean and reliable data.Apply partitioning, caching, and resource tuning to efficiently process high volumes of data for large-scale performance optimization.Design star and snowflake schemas along with fact and dimension tables for multidimensional modeling in reporting use cases.Work with tabular and OLAP (Online Analytical Processing) cube structures in Azure Analysis Services to enable downstream business intelligence.Collaborate with the DevOps team to define infrastructure, manage access and security, and automate deployments.Skills and Experience
Ingest and harmonize data from SAP (Systems, Applications, and Products) ECC (ERP Central Component) and S/4HANA systems using Data Sphere.Use Git, Azure DevOps Pipelines, Terraform, or Azure Resource Manager (ARM) templates for CI/CD (Continuous Integration/Continuous Deployment) and DevOps tooling.Leverage Azure Monitor, Log Analytics, and data pipeline metrics for data observability and monitoring.Conduct query diagnostics, identify bottlenecks, and determine root causes for performance troubleshooting.Apply metadata management, track data lineage, and enforce compliance best practices for data governance and cataloging.Document processes, designs, and solutions effectively in Confluence.