Key Responsibilities:
1.
ETL Pipeline Development:
- Design, develop, and maintain scalable ETL processes to extract, transform, and load data from various structured and unstructured sources into GCP-based data warehouses (BigQuery, Cloud SQL, Cloud Storage, etc.).
- Develop efficient SQL queries and scripts to support data transformation, aggregation, and validation.
- Optimize ETL workflows to ensure low-latency data processing and high performance.
2.
Google Cloud Dataform & Data Transformation:
- Utilize Google Cloud Dataform to implement SQL-based data transformations in BigQuery following best practices in data modeling, version control, and dependency management.
- Develop modular SQL workflows using Dataform to simplify transformation logic and enhance reusability.
- Integrate Dataform into existing ETL/ELT pipelines to streamline data engineering and analytics workflows.
- Leverage Dataform's automated testing, scheduling, and Git-based version control for collaborative development and data quality assurance.
3.
Data Integration & Management:
- Work with diverse data sources (databases, APIs, streaming data, and cloud storage) to integrate data into centralized repositories.
- Ensure data consistency, integrity, and accuracy through rigorous testing and validation.
- Implement incremental data loads, change data capture (CDC), and batch/real-time ETL strategies.
- Leverage GCP services like Dataflow, Dataproc, Cloud Functions, and Pub/Sub to handle data ingestion and transformation.
4.
Database & SQL Development:
- Write complex SQL queries, stored procedures, and functions to support analytical and operational data needs.
- Optimize SQL queries for performance tuning and cost efficiency in BigQuery, Cloud SQL, and other relational databases.
- Ensure proper indexing, partitioning, and clustering strategies for optimal query performance.
5.
Cloud & DevOps Integration:
- Deploy and monitor ETL workflows using GCP-native tools (Cloud Composer/Airflow, Dataform, Dataflow, Dataprep, etc.).
- Implement CI/CD pipelines for ETL jobs using Terraform, Cloud Build, GitHub Actions, or Jenkins.
- Work with Infrastructure and DevOps teams to ensure secure and reliable deployment of ETL solutions in a cloud environment.
6.
Data Quality & Governance:
- Implement data validation, data cleansing, and error-handling mechanisms in ETL pipelines.
- Monitor data pipeline performance and ensure timely resolution of issues and failures.
- Work with stakeholders to define data governance policies, metadata management, and access controls.
7.
Documentation & Collaboration:
- Maintain comprehensive documentation for ETL workflows, data transformations, and technical design.
- Collaborate with data engineers, data analysts, and business teams to understand data needs and optimize data processing workflows.
- Conduct code reviews and provide mentorship to junior developers when necessary.
Required Skills & Qualifications:
1.
Technical Skills:
ETL Development:
- Hands-on experience in designing and implementing ETL pipelines.
- Proficiency in ETL tools such as Apache Airflow (Cloud Composer), Dataflow, or Informatica.
SQL & Database Management:
- Strong expertise in SQL (DDL, DML, performance tuning, indexing, partitioning, stored procedures, etc.).
- Experience working with relational (Cloud SQL, PostgreSQL, MySQL) and NoSQL databases (Bigtable, Firestore, MongoDB, etc.).
Cloud (GCP) Expertise:
- Strong hands-on experience with Google Cloud Platform (GCP) services:
- BigQuery (data warehousing & analytics)
- Cloud Storage (data lake storage)
- Cloud Composer (Apache Airflow) (workflow orchestration)
- Cloud Functions (serverless ETL tasks)
- Cloud Dataflow (Apache Beam-based data processing)
- Pub/Sub (real-time streaming)
- Dataproc (Hadoop/Spark-based processing)
- Google Cloud Dataform (SQL-based transformations for BigQuery)
Programming & Scripting:
- Experience with Python, SQL scripting, and Shell scripting for ETL automation.
- Knowledge of PySpark or Apache Beam is a plus.
CI/CD & DevOps:
- Experience in deploying ETL workflows using Terraform, Cloud Build, or Jenkins.
- Familiarity with Git/GitHub for version control.
Skills Required
Git Hub, Git, Sql, Python, Terraform