Results-oriented Data Engineer with 3+ years in building scalable data pipelines and orchestrating workflows using Azure Data Factory and Azure Databricks. Expertise in developing modern cloud-based data lakehouse architectures utilizing Databricks, Delta Lake, Apache Spark, PySpark and SQL. Demonstrated success in optimizing ETL pipelines, enhancing data quality and governance. Possesses strong expertise in Delta Lake features including ACID transactions, schema enforcement, and time travel to support both real-time and batch analytics for informed enterprise decision-making.
Overview
3
3
years of professional experience
1
1
Certification
Work History
Data Engineer
Modak Analytics, Syngenta
10.2023 - Current
Engineered real-time data ingestion pipelines by using Snap Logic to ingest raw data into an external AWS S3 bucket as Parquet files, which were then processed by Databricks.
Designed and implemented high-performance ETL pipelines on Azure Databricks, processing structured and semi-structured data from multiple enterprise systems
Implemented a robust incremental data loading mechanism using Delta Lake within the Medallion Architecture. Enhanced performance by applying incremental loads across 40+ tables, ensuring data integrity and significantly reducing processing times.
Created and optimized PySpark jobs in Databricks to process and transform large datasets for downstream analytics
Developed a scalable Azure-based data lakehouse architecture using Databricks Delta Lake to support ACID transactions, schema evolution, and time travel for reliable data versioning.
Implemented comprehensive data quality frameworks within Databricks to ensure accuracy, consistency, and integrity across both batch and streaming pipelines
Created detailed documentation for the entire data engineering process, ensuring easy knowledge transfer and onboarding for future projects.
Developed and conducted training sessions for team members, empowering them with the skills needed to effectively utilize the implemented data engineering solutions.
Collaborated in Agile development processes, including sprint planning, backlog grooming, and daily stand-ups to ensure the timely delivery of projects
Data Engineer
Modak Analytics, Abbvie
04.2022 - 09.2023
Designed and implemented a comprehensive Azure Data Factory solution for ETL processes, utilizing SQL, Data Factory pipelines, Databricks, and PySpark to handle the extraction, transformation, and loading of data from diverse sources.
Implemented security best practices by leveraging Azure Key Vault for storing and managing sensitive information such as credentials and secrets.
Incorporated Logic Apps to automate email notifications, enhancing system monitoring and alerting capabilities for timely issue resolution.
Developed and optimized ETL pipelines in Databricks using PySpark and SQL to process and store data
Developed robust batch processing workflows utilizing Delta Lake’s ACID transactions, schema enforcement, and time travel features to maintain high data quality and governance.
Designed scalable PySpark/SQL jobs to efficiently ingest, transform, and load data into Delta Lake, enabling seamless analytics and reporting.
Reduced ETL runtime by 50% through advanced partitioning, Spark tuning and file compaction
Managed CI/CD using Git and GitLab for seamless deployment of Databricks workflows and code
Collaborated with cross-functional teams to gather business requirements and translate them into effective data engineering solutions.
Education
Bachelor of Technology (B.Tech) - Information Technology
Gokaraju Rangaraju Institute of Engineering And Technology
Hyderabad
05.2022
Skills
Platforms:
Azure, Azure Databricks (ADB),Azure Data Lake Gen2 (ADLS), Azure SQL Data Warehouse (DWH), SnapLogic
Languages:
PySpark, Python, SQL
Technologies:
Apache Spark, Apache Iceberg, Apache Kafka, Delta Lake, Spark SQL, Parquet File Format