Role  
AWS Data Engineer 
Experience  
8+ years 
Location  
Remote 
Time Zone  
UK 
Duration  
2 months (Extendable) 
Job Description  
- Design, development, and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark on AWS EMR.
 
 
- Writing reusable, testable, and efficient code 
- Integration of data storage solutions in spark – especially with AWS S3 object storage.
 
 Performance tuning of pySpark scripts.
 
 
- Need to ensure overall build delivery quality is good and on-time delivery is done at all times.
 
 
- Should be able to handle meetings with customers with ease.
 
 
- Need to have excellent communication skills to interact with the customer.
 
 
- Be a team player and willing to work in an onsite-offshore model, mentor other folks in the team (onsite as well as offshore) 
*5+ years of experience in programming with python.
Strong proficiency in python 
*Familiarity with functional programming concepts 
*3+ years of hands-on experience in developing ETL data pipelines using pySpark on AWS EMR 
*Experience in building pipelines and data lake for large enterprises on AWS 
*Good understanding of Spark’s Dataframe and API 
*Experience in configuring EMR clusters on AWS 
*Experience in dealing with AWS S3 object storage from Spark.
*Experience in troubleshooting spark jobs.
Knowledge of monitoring spark jobs using Spark UI 
*Performance tuning of Spark jobs.
*Understanding fundamental design principles behind business processes 
 Process Knowledge and Expertise:  
- Demonstrated experience in change management processes, including understanding of governance frameworks and preparation of supporting artefacts required for approvals.
 
 
- Strong clarity on the path to production, with hands-on involvement in deployments, testing cycles, and obtaining business sign-offs.
 
 
- Proven track record in technical solution design, with the ability to provide architectural guidance and support implementation strategies.
 
 
Databricks-Specific Skills:  
- Experience in at least developing and delivering end-to-end Proof of Concept (POC) solutions covering the below: 
- Basic proficiency in Databricks, including creating jobs and configuring clusters.
 
 
- Exposure to connecting external data sources (e.g., Amazon S3) to Databricks.
 
 
- Understanding of Unity Catalog and its role in data governance.
 
 
- Familiarity with notebook orchestration and implementing modular code structures to enhance scalability and maintainability.
 
 
Important Pointers:  
- Candidates must have actual hands-on work experience , not just home projects or academic exercises.
 
 
- Profiles should clearly state how much experience they have in each skill area , as this helps streamline the interview process.
 
 
- Candidates must know their CV/profile inside out , including all projects and responsibilities listed.
 
 Any ambiguity or lack of clarity on the candidate’s part can lead to immediate rejection, as we value accuracy and ownership.
 
 
- They should be able to confidently explain their past experience, challenges handled, and technical contributions.