Job Description
            
                Candidate should be able to:
 Coordinate Development, Integration, and Production deployments.
 Optimize Spark code, Impala queries, and Hive partitioning strategy for better scalability, reliability, and performance.
 Build applications using Maven, SBT and integrated with continuous integration servers like Jenkins to build jobs.
 Execute Hadoop ecosystem and Applications through Apache HUE 
 Build Machine Learning Algorithms using Spark.
 Perform migration from Legacy Databases RDBMS to Hadoop Ecosystem 
 Create mapping documents to outline data flow from source to target.
 Use Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster 
 Design and deploy enterprise-wide scalable operations 
 Work on leading BI technologies like MSTR, Tableau over Hadoop Ecosystem through ODBC/JDBC connection 
 Perform Performance tuning of Impala queries
 Work on hive performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive, and Map Side joins.
 Create various database objects like tables, views, functions, and triggers using SQL 
 Understand business needs, analyze functional specifications and map those to development and designing Apache Spark programs and algorithms.
 Install, configure, and use Hadoop components like Spark, Spark Job server, Spark Thrift server, Phoenix on HBase, Flume, Sqoop 
 Write SPARK jobs to fetch large data volumes from the source 
 preparing technical specifications, analyzing functional specs, development, and maintenance of code
 Develop end to end data pipeline using Spark, Hive, and Impala 
 Design and documented operational problems by following standards and procedures using software reporting tool JIRA 
 Use Rest services to access HBASE data and used the data for further processing in the downstream systems 
 Data wrangling and creating workable datasets and work on different file formats like Parquet, ORC, Sequence files, and different serialization formats like Avro 
 Perform Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against complexity and timelines.
 complete the full lifecycle of software development and deliver on time 
 work with end-users to gather requirements and convert them to working documents 
 interface with various solution/business areas to understand the requirements and prepare documentation to support development 
 work in a fast-paced, team-oriented environment
 Candidate should have:
 Experience in Hadoop, HBase, MongoDB, or other NoSQL platforms
 Experience with Spark and Spark SQL 
 Excellent communication skills with both Technical and Business audience 
 hands-on experience in Java, Spark, Scala, AKKA, Hive, Maven/SBT, Amazon S3 
 Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala 
 Good experience in debugging issues using the Hadoop, Spark Log files 
 Knowledge in Sqoop, Flume preferred 
 Experience in Kafka, ReST services is a plus.
 Experience in Apache Phoenix, Text Search (Solr, ElasticSearch, CloudSearch) 
 Expertise in Shell-Scripts, Cron Automation, and Regular Expressions 
 3+ years strong native SQL skills 
 1+ years experience with Hadoop, Hive, Impala, HBase, and related technologies, MapReduce/YARN, Lambda architectures, MPP shared-nothing database systems, and NoSQL systems 
 3+ years experience with Scala, Spark, Linux