Job Description
            
                <p><p><b>Senior Software Engineer, Remote.</b><br/><br/> The Software Engineer, Data Ingestion will be a critical individual contributor responsible for designing collection strategies, developing, and maintaining robust and scalable data pipelines.<br/><br/> This role is at the heart of our data ecosystem, deliver new analytical software solution to access timely, accurate, and complete data for insights, products, and operational efficiency.<br/><br/><b>Key Responsibilities : </b></p><p><p><b><br/></b></p>- Design, develop, and maintain high-performance, fault-tolerant data ingestion pipelines using Python.<br/><br/></p><p>- Integrate with diverse data sources (databases, APIs, streaming platforms, cloud storage, etc.<br/><br/></p><p>- Implement data transformation and cleansing logic during ingestion to ensure data quality.<br/><br/></p><p>- Monitor and troubleshoot data ingestion pipelines, identifying and resolving issues promptly.<br/><br/></p><p>- Collaborate with database engineers to optimize data models for fast consumption.<br/><br/></p><p>- Evaluate and propose new technologies or frameworks to improve ingestion efficiency and reliability.<br/><br/></p><p>- Develop and implement self-healing mechanisms for data pipelines to ensure continuity.<br/><br/></p><p>- Define and uphold SLAs and SLOs for data freshness, completeness, and availability.<br/><br/></p><p>- Participate in on-call rotation as needed for critical data pipeline issues.<br/><br/><b>Key Skills : </b><br/><br/></p><p>- 5+ years of experience, ideally with background in Computer Science, working in software product companies.<br/><br/></p><p>- <b>Extensive Python Expertise : </b> Extensive experience in developing robust, production-grade applications with Python.<br/><br/></p><p>- <b>Data Collection & Integration : </b> Proven experience collecting data from various sources (REST APIs, OAuth, GraphQL, Kafka, S3, SFTP, etc.<br/><br/></p><p>- <b>Distributed Systems & Scalability : </b> Strong understanding of distributed systems concepts, designing for scale, performance optimization, and fault tolerance.<br/><br/></p><p>- <b>Cloud Platforms : </b> Experience with major cloud providers (AWS or GCP) and their data-related services (i.e., S3, EC2, Lambda, SQS, Kafka, Cloud Storage, GKE).<br/><br/></p><p>- <b>Database Fundamentals : </b> Solid understanding of relational databases (SQL, schema design, indexing, query optimization).
OLAP database experience is a plus (Hadoop).</p><p><br/></p><p>- <b>Monitoring & Alerting : </b> Experience with monitoring tools (i.e., Prometheus, Grafana) and setting up effective alerts.<br/><br/></p><p>- <b>Version Control : </b> Proficiency with Git.<br/><br/></p><p>- <b>Containerization (Plus) : </b> Experience with Docker and Kubernetes.<br/><br/></p><p>- <b>Streaming Technologies (Plus) : </b> Experience with real-time data processing using Kafka, Flink, Spark Streaming.</p><br/></p> (ref:hirist.tech)