Responsibilities:
- Develop and maintain data processing workflows using Apache Spark and Scala
- Implement batch and streaming data pipelines
- Optimize Spark jobs for better performance and scalability
- Collaborate with data engineers and analysts to deliver data solutions
- Debug and resolve issues in production big data environments
- Integrate with data storage systems like HDFS, Kafka, and NoSQL databases
- Write clean, maintainable code with best practices in mind
Required Skills:
- Strong programming skills in Scala and hands-on experience with Apache Spark
- Knowledge of Spark Core, Spark SQL, Spark Streaming, and MLlib
- Experience with Hadoop ecosystem components (HDFS, Hive, Kafka)
- Familiarity with functional programming concepts
- Experience with data serialization formats (Parquet, Avro, ORC)
- Version control (Git) and CI/CD understanding
- Good problem-solving and communication skills
Skills Required
Spark Core, Spark SQL, Spark Streaming, hdfs , Hive, Kafka