About the Role  
As a Quantitative Data Engineer , you will be the backbone of the data ecosystem powering our quantitative research, trading, and AI-driven strategies .
You will design, build, and maintain the high-performance data infrastructure  that enables low-latency, high-fidelity access to market, fundamental, and alternative data across multiple asset classes.
This role bridges quant engineering, data systems, and research enablement , ensuring that our researchers and traders have fast, reliable, and well-documented datasets for analysis and live trading.
You’ll be part of a cross-functional team working at the intersection of finance, machine learning, and distributed systems .
Responsibilities  
- Architect and maintain scalable ETL pipelines  for ingesting and transforming terabytes of structured, semi-structured, and unstructured market and alternative data.
 
 
- Design time-series optimized data stores  and streaming frameworks  to support low-latency data access for both backtesting and live trading.
 
 
- Develop ingestion frameworks  integrating vendor feeds (Bloomberg, Refinitiv, Polygon, Quandl, etc.), exchange data, and internal execution systems.
 
 
- Collaborate with quantitative researchers and ML teams  to ensure data accuracy, feature availability, and schema evolution aligned with modeling needs.
 
 
- Implement data quality checks, validation pipelines, and version control mechanisms  for all datasets.
 
 
- Monitor and optimize distributed compute environments  (Spark, Flink, Ray, or Dask) for performance and cost efficiency.
 
 
- Automate workflows  using orchestration tools (Airflow, Prefect, Dagster) for reliability and reproducibility.
 
 
- Establish best practices  for metadata management, lineage tracking, and documentation.
 
 
- Contribute to internal libraries and SDKs  for seamless data access by trading and research applications.
 
 
In Trading Firms, Data Engineers Typically:  
- Build real-time data streaming systems  to capture market ticks, order books, and execution signals.
 
 
- Manage versioned historical data lakes  for backtesting and model training.
 
 
- Handle multi-venue data normalization  (different exchanges and instruments).
 
 
- Integrate alternative datasets  (satellite imagery, news sentiment, ESG, supply-chain data).
 
 
- Work closely with quant researchers  to convert raw data into research-ready features .
 
 
- Optimize pipelines for ultra-low latency  where milliseconds can impact P&L.
 
 
- Implement data observability frameworks  to ensure uptime and quality.
 
 
- Collaborate with DevOps and infra engineers  to scale storage, caching, and compute.
 
 
Tech Stack  
- Languages:  Python, SQL, Scala, Go, Rust (optional for HFT pipelines) 
- Data Processing:  Apache Spark, Flink, Ray, Dask, Pandas, Polars 
- Workflow Orchestration:  Apache Airflow, Prefect, Dagster 
- Databases & Storage:  PostgreSQL, ClickHouse, DuckDB, ElasticSearch, Redis 
- Data Lakes:  Delta Lake, Iceberg, Hudi, Parquet 
- Streaming:  Kafka, Redpanda, Pulsar 
- Cloud & Infra:  AWS (S3, EMR, Lambda), GCP, Azure, Kubernetes 
- Version Control & Lineage:  DVC, MLflow, Feast, Great Expectations 
- Visualization / Monitoring:  Grafana, Prometheus, Superset, DataDog 
- Tools for Finance:  kdb+/q (for tick data), InfluxDB, QuestDB 
What You Will Gain  
- End-to-end ownership  of core data infrastructure in a high-impact, mission-critical domain.
 
 
- Deep exposure to quantitative research workflows , market microstructure , and real-time trading systems .
 
 
- Collaboration with elite quantitative researchers, traders, and ML scientists.
 
 
- Hands-on experience with cutting-edge distributed systems  and time-series data technologies .
 
 
- A culture that emphasizes technical excellence, autonomy, and experimentation.
 
 
Qualifications  
- Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.
 
 
- 2+ years  of experience building and maintaining production-grade data pipelines .
 
 
- Proficiency in Python , SQL , and frameworks like Airflow , Spark , or Flink .
 
 
- Familiarity with cloud storage and compute (S3, GCS, EMR, Dataproc)  and versioned data lakes (Delta, Iceberg) .
 
 
- Experience with financial datasets , tick-level data , or high-frequency time series  is a strong plus.
 
 
- Strong understanding of data modeling, schema design, and performance optimization .
 
 
- Excellent communication skills with an ability to support multidisciplinary teams .