
Hadoop & Spark/PySpark Developer and analyst with over 11+ years of overall experience as data engineer in design, development, deploying and large scale supporting large scale distributed systems like Cloudera CDH/Hortonworks. 11+ years of extensive experience as Hadoop and Spark/PySpark engineer and Data Engineer. Implemented various frameworks for data pipelines and workflows using HBase, Kafka with Spark/PySpark, Python and Scala. Implemented frameworks to import and export data from Hadoop to RDBMS. Implemented frameworks to extract, transform, load data from various sources. Excellent understanding of Hadoop architecture and underlying framework including storage management in Cloudera CDH/ Hortonworks. Expertise in using various Hadoop infrastructures such as Hive, Zookeeper, Hbase, Kafka, Sqoop, Oozie and Spark/PySpark for data storage and analysis. Experience in developing custom UDFs for Spark/PySpark and Hive to incorporate methods Experienced in running query - using Hive, Impala and used BI tools to run ad-hoc queries directly on Hadoop in Cloudera CDH/Hortonworks and AWS. Good experience in Oozie Framework and Automating daily import jobs. Experienced in troubleshooting errors in HBase Shell/API, Pig, Hive and Spark/PySpark. Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop in Cloudera CDH/ Hortonworks. Collected logs data from various sources and integrated into HDFS using Flume. Assisted Deployment team in setting up Hadoop cluster and services. Good experience in Generating Statistics/extracts/reports from the Hadoop. Good understanding of NoSQL Databases and hands on work experience in writing applications on No SQL databases like HBase. Designed and implemented a product search service using Apache Solr. Good knowledge in querying data from HBase for searching, grouping and sorting. Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data. Good Knowledge in Azure Databricks concepts like EventHub and Azure DataBricks Data Lake which provides fast and efficient processing of Big Data. Having good knowledge in Benchmarking & Performance Tuning of clusters in Cloudera CDH/Hortonworks. Experienced in Identifying improvement areas for systems stability and providing end high availability architectural solutions. Good experience in Generating Statistics and reports from Hadoop. Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.