Job Description
<p><p><b>What Youll Do :</b></p><p><br/></p><p>- Be the Data Tech Leader: Mentor engineers, champion data engineering best practices, and raise the bar for technical excellence across the org.<br/><br/></p><p>- Architect at Scale: Design and lead petabyte-scale data ingestion, processing, and analytics platforms using Snowflake, Apache Spark, Iceberg, Parquet, and AWS-native services.<br/><br/></p><p>- Own the Data Flow: Build streaming and batch pipelines handling billions of events daily, orchestrated through Apache Airflow for reliability and fault tolerance.<br/><br/></p><p>- Set the Standards: Define frameworks for data modeling, schema evolution, partitioning strategies, and data quality/observability for analytics and AI workloads.<br/><br/></p><p>- Code Like a Pro: Stay hands-on, writing high-performance data processing jobs in Python, SQL, and Scala, and conducting deep-dive reviews when it matters most.<br/><br/></p><p>- Master the Lakehouse: Architect data lakes and warehouse solutions that balance cost, performance, and scalability, leveraging AWS S3 and Snowflake.<br/><br/></p><p>- Solve Complex Problems: Elegantly and efficiently debug and optimize long-running jobs, data skew, and high-volume ETL bottlenecks.<br/><br/></p><p>- Collaborate and influence: Work with the Product, AI/ML, and Platform teams to ensure that data solutions directly power real-time cyber risk analytics.<br/><br/></p><p>- Innovate Constantly: Evaluate and introduce emerging data technologies (e.g., Flink, Druid, Rockset) to keep SAFE at the forefront of data engineering innovation.</p><p><br/><b>What Were Looking For :</b><br/><br/></p><p>- 8+ years of experience in data engineering, with a proven track record of designing and scaling distributed data systems.<br/><br/></p><p>- Deep expertise in big data processing frameworks (Apache Spark, Flink) and workflow orchestration (Airflow).<br/><br/></p><p>- Strong hands-on experience with data warehousing (Snowflake) and data lakehouse architectures (Iceberg, Parquet).<br/><br/></p><p>- Proficiency in Python, SQL, Scala, Go/Nodejs with an ability to optimize large-scale ETL/ELT workloads.<br/><br/></p><p>- Expertise in real-time data ingestion pipelines using Kafka or Kinesis, handling billions of events daily.<br/><br/></p><p>- Experience operating in cloud-native environments (AWS) and leveraging services like S3, Lambda, ECS, Glue, and Athena.<br/><br/></p><p>- Strong understanding of data modeling, schema design, indexing, and query optimization for analytical workloads.<br/><br/></p><p>- Proven leadership in mentoring engineers, driving architectural decisions, and aligning data initiatives with product goals.<br/><br/></p><p>- Experience in streaming architectures, CDC pipelines, and data observability frameworks.<br/><br/></p><p>- Ability to navigate ambiguous problems, high-scale challenges, and lead teams toward innovative solutions.<br/><br/></p><p>- Proficient in deploying containerized applications (Docker, Kubernetes, ECS).<br/><br/></p><p>- Familiarity with using AI Coding assistants like Cursor, Claude Code, or GitHub Copilot.</p><p><br/><b>Preferred Qualification :</b><br/><br/></p><p>- Exposure to CI/CD pipelines, automated testing, and infrastructure-as-code for data workflows.<br/><br/></p><p>- Familiarity with real-time analytics engines (Druid, Pinot, Rockset) or machine learning data pipelines.<br/><br/></p><p>- Contributions to open-source data projects or thought leadership in the data engineering community.<br/><br/></p><p>- Prior experience in cybersecurity, risk quantification, or other high-scale SaaS domain.</p><br/></p> (ref:hirist.tech)