Job Description
About the Role
As a
Quantitative Data Engineer , you will be the backbone of the data ecosystem powering our
quantitative research, trading, and AI-driven strategies .
You will design, build, and maintain the
high-performance data infrastructure
that enables low-latency, high-fidelity access to market, fundamental, and alternative data across multiple asset classes.
This role bridges
quant engineering, data systems, and research enablement , ensuring that our researchers and traders have fast, reliable, and well-documented datasets for analysis and live trading.
You’ll be part of a cross-functional team working at the intersection of
finance, machine learning, and distributed systems .
Responsibilities
Architect and maintain scalable ETL pipelines
for ingesting and transforming terabytes of structured, semi-structured, and unstructured market and alternative data.
Design time-series optimized data stores
and
streaming frameworks
to support low-latency data access for both backtesting and live trading.
Develop ingestion frameworks
integrating vendor feeds (Bloomberg, Refinitiv, Polygon, Quandl, etc.), exchange data, and internal execution systems.
Collaborate with quantitative researchers and ML teams
to ensure data accuracy, feature availability, and schema evolution aligned with modeling needs.
Implement data quality checks, validation pipelines, and version control mechanisms
for all datasets.
Monitor and optimize distributed compute environments
(Spark, Flink, Ray, or Dask) for performance and cost efficiency.
Automate workflows
using orchestration tools (Airflow, Prefect, Dagster) for reliability and reproducibility.
Establish best practices
for metadata management, lineage tracking, and documentation.
Contribute to internal libraries and SDKs
for seamless data access by trading and research applications.
In Trading Firms, Data Engineers Typically:
Build
real-time data streaming systems
to capture market ticks, order books, and execution signals.
Manage
versioned historical data lakes
for backtesting and model training.
Handle
multi-venue data normalization
(different exchanges and instruments).
Integrate
alternative datasets
(satellite imagery, news sentiment, ESG, supply-chain data).
Work closely with
quant researchers
to convert raw data into
research-ready features .
Optimize pipelines for
ultra-low latency
where milliseconds can impact P&L.
Implement
data observability frameworks
to ensure uptime and quality.
Collaborate with
DevOps and infra engineers
to scale storage, caching, and compute.
Tech Stack
Languages:
Python, SQL, Scala, Go, Rust (optional for HFT pipelines)
Data Processing:
Apache Spark, Flink, Ray, Dask, Pandas, Polars
Workflow Orchestration:
Apache Airflow, Prefect, Dagster
Databases & Storage:
PostgreSQL, ClickHouse, DuckDB, ElasticSearch, Redis
Data Lakes:
Delta Lake, Iceberg, Hudi, Parquet
Streaming:
Kafka, Redpanda, Pulsar
Cloud & Infra:
AWS (S3, EMR, Lambda), GCP, Azure, Kubernetes
Version Control & Lineage:
DVC, MLflow, Feast, Great Expectations
Visualization / Monitoring:
Grafana, Prometheus, Superset, DataDog
Tools for Finance:
kdb+/q (for tick data), InfluxDB, QuestDB
What You Will Gain
End-to-end ownership
of core data infrastructure in a high-impact, mission-critical domain.
Deep exposure to
quantitative research workflows ,
market microstructure , and
real-time trading systems .
Collaboration with elite quantitative researchers, traders, and ML scientists.
Hands-on experience with
cutting-edge distributed systems
and
time-series data technologies .
A culture that emphasizes
technical excellence, autonomy, and experimentation.
Qualifications
Bachelor’s or Master’s in
Computer Science, Data Engineering, or related field.
2+ years
of experience building and maintaining
production-grade data pipelines .
Proficiency in
Python ,
SQL , and frameworks like
Airflow ,
Spark , or
Flink .
Familiarity with
cloud storage and compute (S3, GCS, EMR, Dataproc)
and
versioned data lakes (Delta, Iceberg) .
Experience with
financial datasets ,
tick-level data , or
high-frequency time series
is a strong plus.
Strong understanding of
data modeling, schema design, and performance optimization .
Excellent communication skills with an ability to support
multidisciplinary teams .