We are looking for a  PySpark  developer ( with ETL background ) to be able to design and build solution on one of our customer programs.
This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.
  
   Roles and Responsibilities  
    Ability to design, build and unit test the application in Spark/Pyspark  In-depth knowledge of Hadoop, Spark, and similar frameworks  Ability to understand existing ETL logic to convert into Spark/PySpark  Good implementation experience of oops concepts  Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec  Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources  Experience in working with Bitbucket and CI-CD process  Have knowledge of the agile methodology for delivering the projects  Good communication skills   
   Skills  
    Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applications  Expertise in handling complex large-scale Big Data environments  Minimum 2 years of experience in the following: HIVE, YARN, HDFS  Experience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc.  Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities   
   Experience: 6  to 10 Years  (EARLY JOINERS ONLY)  
   
   Location: Pune OR Chennai OR Hyderabad  
   
   Note: We can even consider someone with good hands-on experience in Spark and Scala.