We are looking for a PySpark developer ( with ETL background ) to be able to design and build solution on one of our customer programs.
This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.
Roles and Responsibilities
Ability to design, build and unit test the application in Spark/Pyspark In-depth knowledge of Hadoop, Spark, and similar frameworks Ability to understand existing ETL logic to convert into Spark/PySpark Good implementation experience of oops concepts Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources Experience in working with Bitbucket and CI-CD process Have knowledge of the agile methodology for delivering the projects Good communication skills
Skills
Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applications Expertise in handling complex large-scale Big Data environments Minimum 2 years of experience in the following: HIVE, YARN, HDFS Experience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc. Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities
Experience: 6 to 10 Years (EARLY JOINERS ONLY)
Location: Pune OR Chennai OR Hyderabad
Note: We can even consider someone with good hands-on experience in Spark and Scala.