Key Responsibilities:
- Design, architect, and implement end-to-end big data solutions using MapR, Apache Hadoop, and associated ecosystem tools (e.g., Hive, HBase, Spark, Kafka).
- Lead data platform modernization efforts including architecture reviews, platform upgrades, and migrations.
- Collaborate with data engineers, data scientists, and application teams to gather requirements and build scalable, secure data pipelines.
- Define data governance, security, and access control strategies in the MapR ecosystem.
- Optimize performance of distributed systems, including storage and compute workloads.
- Guide teams on best practices in big data development, deployment, and maintenance.
- Conduct code reviews and architecture assessments.
- Provide mentorship to junior engineers and technical leadership across big data initiatives.
Qualifications and Requirements:
- Bachelor's or Master's degree in Computer Science, Data Engineering, or related field.
- 6+ years of experience in data architecture, with at least 3 years working specifically on MapR and Hadoop ecosystems.
- Expertise in MapR-DB, MapR Streams, and MapR-FS.
- Proficiency with big data tools: Apache Spark, Kafka, Hive, HBase, Oozie, Sqoop, Flume.
- Strong programming skills in Java, Scala, or Python.
- Solid understanding of distributed systems, high availability, and cluster management.
- Experience with data ingestion, transformation, and ETL pipelines.
- Familiarity with security controls (Kerberos, Ranger, Knox, etc.) and data governance.
- Experience with CI/CD, Docker, Kubernetes is a plus.
Desirable Skills and Certifications:
- Certifications such as:
- Cloudera Certified Professional (CCP), MapR Certified, or Hortonworks HDP Certification
- Exposure to cloud-based big data platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
- Experience with NoSQL and real-time data streaming architectures.
- Ability to communicate architectural concepts to both technical and non-technical stakeholders.
Skills Required
Nosql, Docker, Kubernetes