Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Technical Knowledge
Key Projects Handled
Personal Information
Timeline
Generic

PABBA SANKAR

Lead Data Engineer
Hyderabad

Summary

Experienced Data Engineer specializing in designing and implementing scalable data pipelines and solutions using AWS, PySpark, and Python. Proven track record in building robust ETL frameworks, real-time and batch data ingestion pipelines, and managing large-scale data processing through distributed systems. Proficient in containerizing applications with Docker and deploying data workflows on cloud-native infrastructure. Skilled in handling diverse data types, including semi-structured and unstructured text data, with practical experience in developing predictive analytics models and conducting text mining for NLP-based applications. Collaborative team player ensuring data availability, quality, and actionable insights for business stakeholders.

Overview

12
12
years of professional experience
4013
4013
years of post-secondary education

Work History

Snr. Technical Lead

LTIMINDTREE
08.2022 - Current

Asst. Manager-Analytics

Concentrix Daksh Services India Pvt Ltd
08.2017 - 08.2022

Sr.Data Engineer

IL&FS Technologies Pvt Ltd
07.2016 - 08.2017

Sr.Software Engineer

Shopalyst Technologies
05.2016 - 07.2016

Lead Engineer

HCL Technologies
01.2013 - 04.2016

Education

M.C.A. -

Holy Mary Institute of Technology And Science

B.Sc. - M.P.C.

J.R.N. Rajasthan Vidyapeeth University

12th -

Board of Intermediate

10th - undefined

Board of School Secondary Education

Skills

  • Spark
  • Hive
  • Kafka
  • Python
  • R
  • Django
  • MySQL
  • Pycharm
  • R-studio
  • R shiny
  • Logistic Regression
  • SVM
  • Random forests
  • Decision Tree
  • Topic Summarization
  • Topic Mining
  • SQL
  • Pyspark
  • Hadoop Ecosystem

  • Hbase
  • AWS
  • s3
  • ECR
  • cloudwatch
  • EC2
  • EMR
  • Athena
  • Lambda
  • SparkSQL
  • DataBricks
  • Shell script
  • Git
  • Jenkins
  • Docker
  • Airflow
  • Bit-bucket
  • Jira

Accomplishments

  • Filed A Patent on Defect Classification And Association In A Software Development Environment
  • Written Technical Blogs for HCL on Call Volume Prediction, Time series Analysis using Machine Learning Algorithms.

Technical Knowledge

Apache Hadoop, HDFS, Spark, Hive, Kafka, Python, R, Java/J2SE, Coginiti Pro, SAS-EG, Django, Servlets, JSP, HTML, JavaScript, XML, Tomcat9, MySQL, Hive, Pycharm, Spyder, R-studio, Eclipse, R shiny, Power BI, Tableau

Key Projects Handled

  • S&P Global, DI Engine, Sr. Technical Lead, SQL, HIVE, Pyspark and Hadoop, Pyspark, python, pandas, Hadoop Ecosystem (HDFS, Hive, Oozie, Kafka, Hbase), Agile (Value-Driven Delivery), AWS (s3, ECR, cloudwatch, EC2, EMR, Athena, Lambda), SQL, SparkSQL, DataBricks, postgres, Shell script, Pycharm (python), Intellij, Git, Jenkins, Docker, Airflow, 12, We architected and implemented a highly extensible and production-grade data ingestion and transformation framework designed to seamlessly process API response datasets stored in AWS S3, encompassing diverse formats such as CSV, nested JSON, and Excel. The platform features intelligent parsing capabilities—including flattening complex JSON hierarchies, selective tab and cell extraction from Excel, and advanced data reshaping through pivoting and unpivoting followed by a rich suite of transformation logic such as standardized date-time formatting, expression-based column derivations, string operations, type validation, joins, unions, window functions, and generation of unique identifiers. Engineered to support both real-time (live) and large-scale historical data pipelines, the application ensures operational efficiency, scalability, and adaptability across business domains. Curated datasets persisted as Delta Tables on external AWS S3 storage, leveraging Delta Lake’s ACID compliance, schema evolution, partitioning, and time-travel capabilities thereby enabling high-performance analytics and trusted data consumption across the organization.
  • Aetna, CNX Payment Integrity, Sr. Data Engineer, Pyspark, SparkSql, Hive, Databricks, Oozie, Jira, Bitbucket, python, R, R-Studio, HDFS, 8, Performing Exploratory data analysis for US based healthcare client by using Bigdata (Hive, Hadoop, Azure data bricks). CNX Payment integrity offers a comprehensive suite of payment integrity solutions. The payment integrity Enterprise suite is designed to provide analysts, investigators, managers, policy makers and stakeholders with insights that help health and human services agencies address fraud, abuse, waste, Overpayment. Packaged and Reusable Analytics models for industry standard solutions. ACAS (Automated claim Auditing System) to analyse historical claims and transactions to identify potential savings for overpayment recovery and reduced administrative costs.
  • Tufts, CNX Payment Integrity, Sr. Data Engineer, Pyspark, SparkSql, Mysql, Azure Data Lake, Databricks, Oozie, Jira, Github, Azure Data Factory, 4, Performing Exploratory data analysis for US based healthcare client by using Bigdata (Hive, Hadoop, Azure data bricks). CNX Payment integrity offers a comprehensive suite of payment integrity solutions. The payment integrity Enterprise suite is designed to provide analysts, investigators, managers, policy makers and stakeholders with insights that help health and human services agencies address fraud, abuse, waste, Overpayment. Packaged and Reusable Analytics models for industry standard solutions. ACAS (Automated claim Auditing System) to analyse historical claims and transactions to identify potential savings for overpayment recovery and reduced administrative costs.
  • Internal Project (Product), Concentrix Insights Platform, Sr. Data Engineer, python, J2EE, Django, Html, JavaScript, Pyspark, Power BI, 4, Developed Enterprise-wide Self Service Analytics Platform bringing data, analytics, tools which process under one roof for customer deliverables. Secure and Automated Platform to mine and analyse the data to give a 360 view with Embedded and Reusable models which enables analyst to focus on driving business values and insights.

Personal Information

Date of Birth: 03/16/85

Timeline

Snr. Technical Lead

LTIMINDTREE
08.2022 - Current

Asst. Manager-Analytics

Concentrix Daksh Services India Pvt Ltd
08.2017 - 08.2022

Sr.Data Engineer

IL&FS Technologies Pvt Ltd
07.2016 - 08.2017

Sr.Software Engineer

Shopalyst Technologies
05.2016 - 07.2016

Lead Engineer

HCL Technologies
01.2013 - 04.2016

B.Sc. - M.P.C.

J.R.N. Rajasthan Vidyapeeth University

10th - undefined

Board of School Secondary Education

M.C.A. -

Holy Mary Institute of Technology And Science

12th -

Board of Intermediate
PABBA SANKARLead Data Engineer