Key Responsibilities:
- Design and implement scalable, fault-tolerant data pipelines using Scala, Kafka, and other Big Data technologies.
- Build real-time data streaming applications to process high-volume datasets from various sources.
- Integrate and manage containerized applications using Docker and orchestrate deployments with Kubernetes.
- Work with distributed data processing frameworks such as Apache Spark, Flink, or Kafka Streams.
- Optimize data ingestion, transformation, and storage processes for performance and reliability.
- Collaborate with data scientists, analysts, and backend engineers to meet analytical and operational needs.
- Monitor and troubleshoot production issues in real time, ensuring high availability and low latency.
- Implement data quality, observability, and monitoring best practices across the pipeline.
Qualifications and Requirements:
- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
- 4+ years of experience in Big Data and streaming platform development.
- Proficient in Scala (required); knowledge of Java or Python is a plus.
- Hands-on experience with Apache Kafka (Kafka Connect, Kafka Streams, or Confluent Platform).
- Strong experience in Docker containerization and Kubernetes orchestration.
- Familiarity with Big Data frameworks: Apache Spark, Hadoop, or Flink.
- Working knowledge of CI/CD pipelines, Git, and monitoring tools (e.g., Prometheus, Grafana).
- Experience with both batch and stream processing architectures.
Desirable Skills and Certifications:
- Experience with cloud data platforms (e.g., AWS EMR, GCP Dataflow, Azure Synapse).
- Understanding of data lake, data warehouse, and data mesh principles.
- Familiarity with NoSQL databases (e.g., Cassandra, HBase, or MongoDB).
- Certifications like Confluent Certified Developer, CKA/CKAD, or Databricks Spark Developer.
- Knowledge of security and compliance in Big Data environments (encryption, RBAC, auditing).
Skills Required
Azure Synapse, Git, No Sql