We are looking for a  PySpark   developer ( with ETL background  ) to be able to design and build solution on one of our customer programs.
This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.
  
   Roles and Responsibilities     
     Ability to design, build and unit test the application in Spark/Pyspark   In-depth knowledge of Hadoop, Spark, and similar frameworks   Ability to understand existing ETL logic to convert into Spark/PySpark   Good implementation experience of oops concepts   Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec   Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources   Experience in working with Bitbucket and CI-CD process   Have knowledge of the agile methodology for delivering the projects   Good communication skills    
   Skills     
     Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applications   Expertise in handling complex large-scale Big Data environments   Minimum 2 years of experience in the following: HIVE, YARN, HDFS   Experience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc.
  Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities    
   Experience: 6   to 10 Years  (EARLY JOINERS ONLY)     
    
   Location: Pune OR Chennai OR Hyderabad     
    
   Note: We can even consider someone with good hands-on experience in Spark and Scala.