Role Overview
We are seeking a skilled and motivated Data Engineer with 5–8 years of experience in building scalable data pipelines using Python, PySpark, and AWS services.
The ideal candidate will have hands-on expertise in big data processing, orchestration using AWS Step Functions, and serverless computing with AWS Lambda.
Familiarity with DynamoDB and deployment of ETL programs in AWS is essential.
Key Responsibilities
- Design, develop, and maintain robust data pipelines using Python and PySpark
- Handle large-scale data processing and transformation using AWS services
- Implement orchestration workflows using AWS Step Functions
- Develop and manage serverless components using AWS Lambda
- Deploy and monitor ETL programs in AWS environments
- Configure and optimize DynamoDB for data storage and retrieval
- Collaborate with cross-functional teams to understand data requirements and deliver scalable solutions
- Ensure data quality, integrity, and security across all stages of the pipeline
Required Skills & Qualifications
- 5–8 years of experience in data engineering or related field
- Strong proficiency in Python and PySpark
- Solid understanding of AWS services including S3, Lambda, Step Functions, Glue, and DynamoDB Experience deploying and managing ETL workflows in AWS
- Familiarity with NoSQL databases, especially DynamoDB
- Knowledge of CI/CD practices and infrastructure-as-code tools (e.g., CloudFormation, Terraform) is a plus
- Excellent problem-solving and communication skills
- Ability to work independently in a remote setup
What We Offer
- Fully remote work environment
- Opportunity to work on cutting-edge data engineering projects
- Collaborative and inclusive team culture
- Competitive compensation and benefits