Summary
Overview
Work History
Education
Skills
Timeline
Generic

Satish Kumar Embadi

Tolichowki,TG

Summary

Hadoop & Spark/PySpark Developer and analyst with over 11+ years of overall experience as data engineer in design, development, deploying and large scale supporting large scale distributed systems like Cloudera CDH/Hortonworks. 11+ years of extensive experience as Hadoop and Spark/PySpark engineer and Data Engineer. Implemented various frameworks for data pipelines and workflows using HBase, Kafka with Spark/PySpark, Python and Scala. Implemented frameworks to import and export data from Hadoop to RDBMS. Implemented frameworks to extract, transform, load data from various sources. Excellent understanding of Hadoop architecture and underlying framework including storage management in Cloudera CDH/ Hortonworks. Expertise in using various Hadoop infrastructures such as Hive, Zookeeper, Hbase, Kafka, Sqoop, Oozie and Spark/PySpark for data storage and analysis. Experience in developing custom UDFs for Spark/PySpark and Hive to incorporate methods Experienced in running query - using Hive, Impala and used BI tools to run ad-hoc queries directly on Hadoop in Cloudera CDH/Hortonworks and AWS. Good experience in Oozie Framework and Automating daily import jobs. Experienced in troubleshooting errors in HBase Shell/API, Pig, Hive and Spark/PySpark. Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop in Cloudera CDH/ Hortonworks. Collected logs data from various sources and integrated into HDFS using Flume. Assisted Deployment team in setting up Hadoop cluster and services. Good experience in Generating Statistics/extracts/reports from the Hadoop. Good understanding of NoSQL Databases and hands on work experience in writing applications on No SQL databases like HBase. Designed and implemented a product search service using Apache Solr. Good knowledge in querying data from HBase for searching, grouping and sorting. Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data. Good Knowledge in Azure Databricks concepts like EventHub and Azure DataBricks Data Lake which provides fast and efficient processing of Big Data. Having good knowledge in Benchmarking & Performance Tuning of clusters in Cloudera CDH/Hortonworks. Experienced in Identifying improvement areas for systems stability and providing end high availability architectural solutions. Good experience in Generating Statistics and reports from Hadoop. Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.

Overview

13
13
years of professional experience

Work History

Technical Analyst

SenecaGlobal IT Services Private Ltd
03.2018 - Current
  • Implemented a generic ETL framework with high availability for bringing related data for Hadoop & HBase from various sources using Spark/PySpark
  • Performed bulk load and transformations on HBase
  • Implemented various Data Modeling techniques for HBase
  • Joined various tables in HBase using Spark/PySpark and Scala and ran analytics on top of them
  • Participated in various upgrades and troubleshooting activities across enterprises
  • Knowledge in performance troubleshooting and tuning Hadoop clusters
  • Applied Spark/PySpark advanced procedures like text analytics and processing using in-memory processing
  • Implemented frameworks on Hadoop to join data from SQL and No SQL databases and store it in Hadoop
  • Created architecture stack blueprint for data access with NoSQL Database HBase;
  • Brought data from various sources into Hadoop and HBase using Kafka
  • Experienced in using Oozie Operational Services for coordinating the cluster and scheduling workflows
  • Applied Spark/PySpark streaming for real time data transforming
  • Installed and configured Hive and written Hive UDFs using Spark/PySpark
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau
  • Implemented Composite server for the data virtualization needs and created multiple views for restricted data access using a REST API
  • Devised and led the implementation of next generation architecture for more efficient data ingestion and processing
  • Created and implemented various shell scripts for automating jobs
  • Employed AVRO format for entire data ingestion for faster operation and less space utilization
  • Experienced in managing and reviewing Hadoop log files
  • Worked in Agile and Kanban environments
  • Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which were raised after upgrades
  • Implemented test scripts to support test-driven development and continuous integration
  • Used Spark/PySpark for Parallel data processing and better performances
  • Environment: HDP, HDFS, Hive, Pig, Impala, HBASE, Spark/PySpark, Scala, Solr, Java, SQL, Zookeeper, Sqoop, Teradata, CentOS.

Sr Software Engineer

Valuelabs Solutions
04.2014 - 02.2018
  • Converted python data science models (recommendation engine, classification models) to PySpark as a part of data science projects to handle billions of records
  • Designed and developed PySpark applications from various API's to load data into Big Data Warehouse to perform analytics on the data for business requirements
  • Designed and developed PySpark applications to convert unstructured data, manipulate the data according to the business requirements and made it available to the business team to build reports on top of Hive/Impala tables
  • Optimized Pyspark data science models to improve the performance and runtime of the models
  • Developed and Implemented Oozie coordinator to schedule the data science models with the business use case scenarios to reduce the manual intervention
  • And integrated the Oozie workflow with Atomic UC4 to automate both ETL and Hadoop jobs
  • Experienced in Importing and exporting data into HDFS and Hive using Sqoop
  • Participated in development/implementation of Cloudera Hadoop environment
  • Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop
  • Integrated HBase as a distributed persistent metadata store to provide metadata resolution for network entities on the network
  • Involved in various NOSQL databases like Hbase, HBase in implementing and integration
  • Experienced in working with various kinds of data sources such as Teradata and Oracle
  • Successfully loaded files to HDFS from Teradata, and loaded from HDFS to hive and impala
  • Designed and implemented a product search service using Apache Solr/Lucene
  • Load and transform large sets of structured, semi structured and unstructured data
  • Environment: CDH 5.0, 5.1, Map Reduce, HDFS, Hive, Pig, Impala, HBase, Spark/PySpark, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Sr Java Developer

Olive Technology
09.2011 - 04.2014
  • · Followed Agile methodology and involved in daily SCRUM meetings, and sprint planning.
  • · Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and designed Single page applications using AngularJS implemented all front-end components using Spring MVC.
  • · Developed JSP's with Custom Tag Libraries for control of business processes in middle-tier and was involved in their integration.
  • · Developed application on spring 3.x framework by utilizing its features like Spring Dependency injection, Spring Security, Spring Web flow with Spring MVC.
  • · Used spring's dependency injection for injecting entity manager and managed beans into spring beans.
  • · Used scripting elements to write Java code in JSP.
  • · Developed an application using Struts Framework that leverages classical Model View Controller (MVC) architecture with JSP as the view.
  • · Used Enterprise Java Beans (EJB) to write the business objects for the application.
  • · Struts Validation Controls for server-side validation and Java Script for client-side validations.
  • · Worked on Spring Web Flow on Spring MVC for building flows in our web application.
  • · Used Spring Security framework for login authentication, password hashing.
  • · Worked on Java Message Service (JMS) API for developing message-oriented middleware (MOM) layer for handling various asynchronous requests.
  • · Wrote spring configuration file to define beans, define data source.
  • · Extensively worked on both consumption & producing of RESTful based webservices using JAX-RS & jersey parsers.
  • · Created SQL queries, PL/SQLStored Procedures, Functions for the Database layer by studying the required business objects and validating them with Stored Procedures using Oracle.
  • · Developed Java Code using IntelliJ and used multi- module Maven project to integrate Spring Framework, Restful API and microservices and deployed in Tomcat Server.
  • · Used LOG4j to log regular Debug and Exception statements.
  • · Developed unit test cases and suits on JUnit framework for unit testing Junit and Mockito to test the middleware services.
  • · Deployed applications into Continuous integration environments like Jenkins to integrate and deploy code on CI environments for development testing.

Education

Master of Science - Computer Science

Jawaharlal Nehru Technological University
Hyderabad, India
2009

Bachelor of Science - Mathematics

Kakatiya University
Warangal, India
2006

Skills

  • Languages - Scala, Python, SQL, Java, PL/SQL, Linux shell scripts
  • Hadoop Ecosystem - HDFS, Spark/PySpark, YARN, Hive, Hbase, Impala, Zookeeper, Sqoop, Oozie, Drill, Flume, Spark/PySpark, Solr and Avro, AWS, Amazon EC2, S3, Azure Databricks
  • Enterprise Technologies - J2EE, Event Hub, Kinesis, JDBC, Kafka
  • Operating Systems - Windows, Linux, UNIX, Mac-OS
  • IDEs - Eclipse, IntelliJ
  • Relational Databases - Oracle, SQL, DB2, MySQL, Teradata
  • NoSQL databases - Hbase
  • Markup Languages - HTML, XHTML, XML, DHTML
  • Build & Management Tools - ANT, MAVEN, SVN
  • Query Languages - SQL, PL/SQL
  • Methodologies - SDLC, OOAD, Agile
  • Continuous Integration Tools - Jenkins, Docker, Kubernetes

Timeline

Technical Analyst

SenecaGlobal IT Services Private Ltd
03.2018 - Current

Sr Software Engineer

Valuelabs Solutions
04.2014 - 02.2018

Sr Java Developer

Olive Technology
09.2011 - 04.2014

Master of Science - Computer Science

Jawaharlal Nehru Technological University

Bachelor of Science - Mathematics

Kakatiya University
Satish Kumar Embadi