Job Description
<p><p><b>About the job :</b></p><p><br/></p><p>We are seeking a highly skilled ETL Specialist with expertise in building, maintaining, and optimizing data pipelines using Python scripting.<br/><br/> The ideal candidate will have experience working in a Linux environment, managing large-scale data ingestion, processing files in S3, and balancing disk space and warehouse storage efficiently.<br/><br/> This role will be responsible for ensuring seamless data movement across systems while maintaining performance, scalability, and reliability.<br/><br/><b>Key Responsibilities : </b><br/><br/>- ETL Pipeline Development : Design, develop, and maintain efficient ETL workflows using Python to extract, transform, and load data into structured data warehouses.<br/><br/>- Data Pipeline Optimization : Monitor and optimize data pipeline performance, ensuring scalability and reliability in handling large data volumes.<br/><br/>- Linux Server Management : Work in a Linux-based environment, executing command-line operations, managing processes, and troubleshooting system performance issues.<br/><br/>- File Handling & Storage Management : Efficiently manage data files in Amazon S3, ensuring proper storage organization, retrieval, and archiving of data.<br/><br/>- Disk Space & Warehouse Balancing : Proactively monitor and manage disk space usage, preventing storage bottlenecks and ensuring warehouse efficiency.<br/><br/>- Error Handling & Logging : Implement robust error-handling mechanisms and logging systems to monitor data pipeline health.<br/><br/>- Automation & Scheduling : Automate ETL processes using cron jobs, Airflow, or other workflow orchestration tools.<br/><br/>- Data Quality & Validation : Ensure data integrity and consistency by implementing validation checks and reconciliation processes.<br/><br/>- Security & Compliance : Follow best practices in data security, access control, and compliance while handling sensitive data.<br/><br/>- Collaboration with Teams : Work closely with data engineers, analysts, and product teams to align data processing with business needs.<br/><br/><b>Skills Required : </b><br/><br/>- Proficiency in Python : Strong hands-on experience in writing Python scripts for ETL processes.<br/><br/>- Linux Expertise : Experience working with Linux servers, command-line operations, and system performance tuning.<br/><br/>- Cloud Storage Management : Hands-on experience with Amazon S3, including handling file storage, retrieval, and lifecycle policies.<br/><br/>- Data Pipeline Management : Experience with ETL frameworks, data pipeline automation, and workflow scheduling (e.g., Apache Airflow, Luigi, or Prefect).<br/><br/>- SQL & Database Handling : Strong SQL skills for data extraction, transformation, and loading into relational databases and data warehouses.<br/><br/>- Disk Space & Storage Optimization : Ability to manage disk space efficiently, balancing usage across different systems.<br/><br/>- Error Handling & Debugging : Strong problem-solving skills to troubleshoot ETL failures, debug logs, and resolve data inconsistencies.<br/><br/><b>Nice to Have : </b><br/><br/>- Experience with cloud data warehouses (e.g., Snowflake, Redshift, BigQuery).<br/><br/>- Knowledge of message queues (Kafka, RabbitMQ) for data streaming.<br/><br/>- Familiarity with containerization tools (Docker, Kubernetes) for deployment.<br/><br/>- Exposure to infrastructure automation tools (Terraform, Ansible).<br/><br/><b>Qualifications : </b><br/><br/>- Bachelors degree in Computer Science, Data Engineering, or a related field.<br/><br/>- 3+ years of experience in ETL development, data pipeline management, or backend data engineering.<br/><br/>- Strong analytical mindset and ability to handle large-scale data processing efficiently.<br/><br/>- Ability to work independently in a fast-paced, product-driven environment</p><br/></p> (ref:hirist.tech)