Key Responsibilities:
- Develop ingestion pipelines (batch & stream) to move data to S3.
- Convert HiveQL to SparkSQL/PySpark.
- Orchestrate workflows using MWAA (Airflow).
- Build and manage Iceberg tables with proper partitioning and metadata.
- Perform job validation and implement unit testing.
Required Skills:
- 35 years of data engineering experience, with strong AWS expertise.
- Proficient in EMR (Spark), S3, PySpark, and SQL.
- Familiar with Cloudera/HDFS and legacy Hadoop pipelines.
- Knowledge of data lake/lakehouse architectures is a plus.
Skills Required
S3, Pyspark, Hiveql, Sparksql