Summary
Overview
Work History
Education
Skills
University Projects
Timeline
Generic

PRANAY REDDY DAVA

Hyderabad

Summary

Practical Data Engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into the system. Offering a 7-year background managing various aspects of development, design, and delivery of data solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

Theatro
04.2024 - Current
  • Spearheaded data collection from 14,000 servers, ensuring seamless data ingestion from rabbit-MQ into Linux servers for preprocessing.
  • Led the transformation of devices and usage stats data, leveraging Azure Databricks(Apache Spark) for large-scale data processing and analytics.
  • Utilized Azure cloud services (Azure Synapse Analytics, Azure Data Factory, Azure Databricks) for data integration, transformation, and storage.
  • Delivered high-performance data processing solutions, handling millions of records per minute with low latency and high availability.
  • POC on azure stream anlytics by consuming data directly from rabbit-MQ and processing in ASA using stream analytics job.

Senior Data Engineer

Verana Health
08.2021 - 04.2024
  • Migration of Glue pipelines to Kubernetes using AWS EKS, ECR, and Airflow.
  • Orchestrated ETL pipeline in Glue using Parent-Child job architecture and serverless architecture using AWS lambdas and AWS step functions.
  • Designed and Developed AWS microservices-based event-driven data ingestion and normalization processes powered by an internal web application for managing, orchestrating, and executing ETL for a variety of incoming data sources.
  • Implemented flexible, scalable structures that resolve differences between healthcare data systems.
  • Designed and built an ETL Pipeline to manage source-to-target data ingestion for over 3,000 future ingestion streams.
  • Developed AWS ADX Platform for commercializing data modules of Verana which generated 25 million dollars in revenue in a year.
  • Developed an internal tool for the Zendesk application which gets the data based on the patient info present in the Zendesk forms which saved 80% of the tier-2 support team
  • Developed and optimized Apache Spark jobs in Databricks for large-scale data processing tasks, achieving a 40% reduction in processing time and enhancing overall system efficiency.
  • Conducted performance tuning and optimization of Databricks clusters on AWS, optimizing resource utilization and reducing infrastructure costs by 20%.

Data Engineer

Tech Mahindra
07.2018 - 12.2019
  • Led the development of robust data products, including attribute stores and periodic facts, enabling self-service analytics.
  • Used Kafka and Spark frameworks for real-time and batch data processing. Ingested a large amount of data from different data sources into HDFS using Kafka.
  • Experience in querying data using Spark SQL for faster processing of the data sets. Offloaded data from EDW into Hadoop Cluster using PySpark. Developed Pyspark scripts for importing and exporting data into HDFS and Hive.
  • Leveraged Apache Spark and Cloudera Data Engineering (CDE) to optimize data processing jobs, achieving a 40% improvement in runtime performance and supporting more complex data analysis and reporting needs.
  • Implemented data warehousing concepts and ETL processes for efficient data processing.
  • Conducted code reviews and knowledge-sharing sessions to promote good development methodologies

Data Support Engineer

Amazon
07.2017 - 07.2018
  • Collecting, and transforming the data daily, and this data is used to improve India geocodes.
  • With the effort of our team, improved the accuracy of India geocode correction from 30% to 70%.
  • Extracting and analyzing the daily and weekly reports of the package deliveries from the warehouse.

Education

Master of Science - Data Science

University At Buffalo
06-2021

Skills

  • python

  • Docker

  • PySpark

  • AWS

  • Kubernetes

  • SQL

  • Airflow

  • Pandas

  • ETL

  • Terraform

  • TensorFlow

  • Shell Scripting

  • Azure

  • Rabbit-MQ

University Projects

Action Recognition using RNN(TENSORFLOW)


  • Built an LSTM model for recognizing the action done in the video (streaming data).
  • Used pre-trained Inception V3 (transfer learning) for feature extraction of sequential data from the videos.
  • Achieved validation accuracy of 0.75 and top 2 categorical accuracy of 0.87


Convolution Neural Networks Architectures (TENSORFLOW)


  • Implemented VGGNET-16, RESNET-18, Inception-V2.
  • Implemented a total of 18 variants of the above architectures by changing activation (relu, Leakyrelu), regularization (Dropout, BatchNormlization), and optimizer (SGD, Adam).
  • Observed the performance of these models on the CIFAR100 dataset.


Sparkify Data Warehouse {Python, Airflow, Spark, Amazon Redshift,-  S3}


Built a data pipeline using Apache Airflow and Spark to ingest data from a fictional music streaming app called Sparkify to Amazon Redshift data warehouse  which can be used to find insights into what songs users are listening to.

Timeline

Senior Data Engineer

Theatro
04.2024 - Current

Senior Data Engineer

Verana Health
08.2021 - 04.2024

Data Engineer

Tech Mahindra
07.2018 - 12.2019

Data Support Engineer

Amazon
07.2017 - 07.2018

Master of Science - Data Science

University At Buffalo
PRANAY REDDY DAVA