Summary

Overview

Work History

Education

Skills

Timeline

Satish Kumar Embadi

Tolichowki,TG

Summary

Hadoop & Spark/PySpark Developer and analyst with over 11+ years of overall experience as data engineer in design, development, deploying and large scale supporting large scale distributed systems like Cloudera CDH/Hortonworks. 11+ years of extensive experience as Hadoop and Spark/PySpark engineer and Data Engineer. Implemented various frameworks for data pipelines and workflows using HBase, Kafka with Spark/PySpark, Python and Scala. Implemented frameworks to import and export data from Hadoop to RDBMS. Implemented frameworks to extract, transform, load data from various sources. Excellent understanding of Hadoop architecture and underlying framework including storage management in Cloudera CDH/ Hortonworks. Expertise in using various Hadoop infrastructures such as Hive, Zookeeper, Hbase, Kafka, Sqoop, Oozie and Spark/PySpark for data storage and analysis. Experience in developing custom UDFs for Spark/PySpark and Hive to incorporate methods Experienced in running query - using Hive, Impala and used BI tools to run ad-hoc queries directly on Hadoop in Cloudera CDH/Hortonworks and AWS. Good experience in Oozie Framework and Automating daily import jobs. Experienced in troubleshooting errors in HBase Shell/API, Pig, Hive and Spark/PySpark. Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop in Cloudera CDH/ Hortonworks. Collected logs data from various sources and integrated into HDFS using Flume. Assisted Deployment team in setting up Hadoop cluster and services. Good experience in Generating Statistics/extracts/reports from the Hadoop. Good understanding of NoSQL Databases and hands on work experience in writing applications on No SQL databases like HBase. Designed and implemented a product search service using Apache Solr. Good knowledge in querying data from HBase for searching, grouping and sorting. Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data. Good Knowledge in Azure Databricks concepts like EventHub and Azure DataBricks Data Lake which provides fast and efficient processing of Big Data. Having good knowledge in Benchmarking & Performance Tuning of clusters in Cloudera CDH/Hortonworks. Experienced in Identifying improvement areas for systems stability and providing end high availability architectural solutions. Good experience in Generating Statistics and reports from Hadoop. Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.

Overview

years of professional experience

Work History

Technical Analyst

SenecaGlobal IT Services Private Ltd

03.2018 - Current

Implemented a generic ETL framework with high availability for bringing related data for Hadoop & HBase from various sources using Spark/PySpark
Performed bulk load and transformations on HBase
Implemented various Data Modeling techniques for HBase
Joined various tables in HBase using Spark/PySpark and Scala and ran analytics on top of them
Participated in various upgrades and troubleshooting activities across enterprises
Knowledge in performance troubleshooting and tuning Hadoop clusters
Applied Spark/PySpark advanced procedures like text analytics and processing using in-memory processing
Implemented frameworks on Hadoop to join data from SQL and No SQL databases and store it in Hadoop
Created architecture stack blueprint for data access with NoSQL Database HBase;
Brought data from various sources into Hadoop and HBase using Kafka
Experienced in using Oozie Operational Services for coordinating the cluster and scheduling workflows
Applied Spark/PySpark streaming for real time data transforming
Installed and configured Hive and written Hive UDFs using Spark/PySpark
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau
Implemented Composite server for the data virtualization needs and created multiple views for restricted data access using a REST API
Devised and led the implementation of next generation architecture for more efficient data ingestion and processing
Created and implemented various shell scripts for automating jobs
Employed AVRO format for entire data ingestion for faster operation and less space utilization
Experienced in managing and reviewing Hadoop log files
Worked in Agile and Kanban environments
Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which were raised after upgrades
Implemented test scripts to support test-driven development and continuous integration
Used Spark/PySpark for Parallel data processing and better performances
Environment: HDP, HDFS, Hive, Pig, Impala, HBASE, Spark/PySpark, Scala, Solr, Java, SQL, Zookeeper, Sqoop, Teradata, CentOS.

Sr Software Engineer

Valuelabs Solutions

04.2014 - 02.2018

Converted python data science models (recommendation engine, classification models) to PySpark as a part of data science projects to handle billions of records
Designed and developed PySpark applications from various API's to load data into Big Data Warehouse to perform analytics on the data for business requirements
Designed and developed PySpark applications to convert unstructured data, manipulate the data according to the business requirements and made it available to the business team to build reports on top of Hive/Impala tables
Optimized Pyspark data science models to improve the performance and runtime of the models
Developed and Implemented Oozie coordinator to schedule the data science models with the business use case scenarios to reduce the manual intervention
And integrated the Oozie workflow with Atomic UC4 to automate both ETL and Hadoop jobs
Experienced in Importing and exporting data into HDFS and Hive using Sqoop
Participated in development/implementation of Cloudera Hadoop environment
Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop
Integrated HBase as a distributed persistent metadata store to provide metadata resolution for network entities on the network
Involved in various NOSQL databases like Hbase, HBase in implementing and integration
Experienced in working with various kinds of data sources such as Teradata and Oracle
Successfully loaded files to HDFS from Teradata, and loaded from HDFS to hive and impala
Designed and implemented a product search service using Apache Solr/Lucene
Load and transform large sets of structured, semi structured and unstructured data
Environment: CDH 5.0, 5.1, Map Reduce, HDFS, Hive, Pig, Impala, HBase, Spark/PySpark, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Sr Java Developer

Olive Technology

09.2011 - 04.2014

· Followed Agile methodology and involved in daily SCRUM meetings, and sprint planning.
· Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and designed Single page applications using AngularJS implemented all front-end components using Spring MVC.
· Developed JSP's with Custom Tag Libraries for control of business processes in middle-tier and was involved in their integration.
· Developed application on spring 3.x framework by utilizing its features like Spring Dependency injection, Spring Security, Spring Web flow with Spring MVC.
· Used spring's dependency injection for injecting entity manager and managed beans into spring beans.
· Used scripting elements to write Java code in JSP.
· Developed an application using Struts Framework that leverages classical Model View Controller (MVC) architecture with JSP as the view.
· Used Enterprise Java Beans (EJB) to write the business objects for the application.
· Struts Validation Controls for server-side validation and Java Script for client-side validations.
· Worked on Spring Web Flow on Spring MVC for building flows in our web application.
· Used Spring Security framework for login authentication, password hashing.
· Worked on Java Message Service (JMS) API for developing message-oriented middleware (MOM) layer for handling various asynchronous requests.
· Wrote spring configuration file to define beans, define data source.
· Extensively worked on both consumption & producing of RESTful based webservices using JAX-RS & jersey parsers.
· Created SQL queries, PL/SQLStored Procedures, Functions for the Database layer by studying the required business objects and validating them with Stored Procedures using Oracle.
· Developed Java Code using IntelliJ and used multi- module Maven project to integrate Spring Framework, Restful API and microservices and deployed in Tomcat Server.
· Used LOG4j to log regular Debug and Exception statements.
· Developed unit test cases and suits on JUnit framework for unit testing Junit and Mockito to test the middleware services.
· Deployed applications into Continuous integration environments like Jenkins to integrate and deploy code on CI environments for development testing.

Education

Master of Science - Computer Science

Jawaharlal Nehru Technological University

Hyderabad, India

2009

Bachelor of Science - Mathematics

Kakatiya University

Warangal, India

2006

Skills

Languages - Scala, Python, SQL, Java, PL/SQL, Linux shell scripts
Hadoop Ecosystem - HDFS, Spark/PySpark, YARN, Hive, Hbase, Impala, Zookeeper, Sqoop, Oozie, Drill, Flume, Spark/PySpark, Solr and Avro, AWS, Amazon EC2, S3, Azure Databricks
Enterprise Technologies - J2EE, Event Hub, Kinesis, JDBC, Kafka
Operating Systems - Windows, Linux, UNIX, Mac-OS
IDEs - Eclipse, IntelliJ
Relational Databases - Oracle, SQL, DB2, MySQL, Teradata

NoSQL databases - Hbase
Markup Languages - HTML, XHTML, XML, DHTML
Build & Management Tools - ANT, MAVEN, SVN
Query Languages - SQL, PL/SQL
Methodologies - SDLC, OOAD, Agile
Continuous Integration Tools - Jenkins, Docker, Kubernetes

Timeline

Technical Analyst

SenecaGlobal IT Services Private Ltd

03.2018 - Current

Sr Software Engineer

Valuelabs Solutions

04.2014 - 02.2018

Sr Java Developer

Olive Technology

09.2011 - 04.2014

Master of Science - Computer Science

Jawaharlal Nehru Technological University

Bachelor of Science - Mathematics

Kakatiya University

Satish Kumar Embadi

Summary

Overview

Work History

Technical Analyst

Sr Software Engineer

Sr Java Developer

Education

Master of Science - Computer Science

Bachelor of Science - Mathematics

Skills

Timeline

Technical Analyst

Sr Software Engineer

Sr Java Developer

Master of Science - Computer Science

Bachelor of Science - Mathematics

Similar Profiles

SARADA PRASANNA NATHSARADA PRASANNA NATH

Veera Shankara Ravindra Reddy KakarlaVeera Shankara Ravindra Reddy Kakarla

Harshavardan ChitralaHarshavardan Chitrala

Bhaggyasree SatyanBhaggyasree Satyan

Murali Krishna BMurali Krishna B