Job Description
            
                About the RoleAs a  Quantitative Data Engineer , you will be the backbone of the data ecosystem powering our  quantitative research, trading, and AI-driven strategies .
You will design, build, and maintain the  high-performance data infrastructure  that enables low-latency, high-fidelity access to market, fundamental, and alternative data across multiple asset classes.This role bridges  quant engineering, data systems, and research enablement , ensuring that our researchers and traders have fast, reliable, and well-documented datasets for analysis and live trading.
You’ll be part of a cross-functional team working at the intersection of  finance, machine learning, and distributed systems .ResponsibilitiesArchitect and maintain scalable ETL pipelines  for ingesting and transforming terabytes of structured, semi-structured, and unstructured market and alternative data.Design time-series optimized data stores  and  streaming frameworks  to support low-latency data access for both backtesting and live trading.Develop ingestion frameworks  integrating vendor feeds (Bloomberg, Refinitiv, Polygon, Quandl, etc.), exchange data, and internal execution systems.Collaborate with quantitative researchers and ML teams  to ensure data accuracy, feature availability, and schema evolution aligned with modeling needs.Implement data quality checks, validation pipelines, and version control mechanisms  for all datasets.Monitor and optimize distributed compute environments  (Spark, Flink, Ray, or Dask) for performance and cost efficiency.Automate workflows  using orchestration tools (Airflow, Prefect, Dagster) for reliability and reproducibility.Establish best practices  for metadata management, lineage tracking, and documentation.Contribute to internal libraries and SDKs  for seamless data access by trading and research applications.In Trading Firms, Data Engineers Typically:Build  real-time data streaming systems  to capture market ticks, order books, and execution signals.Manage  versioned historical data lakes  for backtesting and model training.Handle  multi-venue data normalization  (different exchanges and instruments).Integrate  alternative datasets  (satellite imagery, news sentiment, ESG, supply-chain data).Work closely with  quant researchers  to convert raw data into  research-ready features .Optimize pipelines for  ultra-low latency  where milliseconds can impact P&L.Implement  data observability frameworks  to ensure uptime and quality.Collaborate with  DevOps and infra engineers  to scale storage, caching, and compute.Tech StackLanguages:  Python, SQL, Scala, Go, Rust (optional for HFT pipelines)Data Processing:  Apache Spark, Flink, Ray, Dask, Pandas, PolarsWorkflow Orchestration:  Apache Airflow, Prefect, DagsterDatabases & Storage:  PostgreSQL, ClickHouse, DuckDB, ElasticSearch, RedisData Lakes:  Delta Lake, Iceberg, Hudi, ParquetStreaming:  Kafka, Redpanda, PulsarCloud & Infra:  AWS (S3, EMR, Lambda), GCP, Azure, KubernetesVersion Control & Lineage:  DVC, MLflow, Feast, Great ExpectationsVisualization / Monitoring:  Grafana, Prometheus, Superset, DataDogTools for Finance:  kdb+/q (for tick data), InfluxDB, QuestDBWhat You Will GainEnd-to-end ownership  of core data infrastructure in a high-impact, mission-critical domain.Deep exposure to  quantitative research workflows ,  market microstructure , and  real-time trading systems .Collaboration with elite quantitative researchers, traders, and ML scientists.Hands-on experience with  cutting-edge distributed systems  and  time-series data technologies .A culture that emphasizes  technical excellence, autonomy, and experimentation.QualificationsBachelor’s or Master’s in  Computer Science, Data Engineering, or related field.2+ years  of experience building and maintaining  production-grade data pipelines .Proficiency in  Python ,  SQL , and frameworks like  Airflow ,  Spark , or  Flink .Familiarity with  cloud storage and compute (S3, GCS, EMR, Dataproc)  and  versioned data lakes (Delta, Iceberg) .Experience with  financial datasets ,  tick-level data , or  high-frequency time series  is a strong plus.Strong understanding of  data modeling, schema design, and performance optimization .Excellent communication skills with an ability to support  multidisciplinary teams .