Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Suraj Akula

Hyderabad

Summary

Results-oriented Data Engineer with 3+ years in building scalable data pipelines and orchestrating workflows using Azure Data Factory and Azure Databricks. Expertise in developing modern cloud-based data lakehouse architectures utilizing Databricks, Delta Lake, Apache Spark, PySpark and SQL. Demonstrated success in optimizing ETL pipelines, enhancing data quality and governance. Possesses strong expertise in Delta Lake features including ACID transactions, schema enforcement, and time travel to support both real-time and batch analytics for informed enterprise decision-making.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Engineer

Modak Analytics, Syngenta
10.2023 - Current
  • Engineered real-time data ingestion pipelines by using Snap Logic to ingest raw data into an external AWS S3 bucket as Parquet files, which were then processed by Databricks.
  • Designed and implemented high-performance ETL pipelines on Azure Databricks, processing structured and semi-structured data from multiple enterprise systems
  • Implemented a robust incremental data loading mechanism using Delta Lake within the Medallion Architecture. Enhanced performance by applying incremental loads across 40+ tables, ensuring data integrity and significantly reducing processing times.
  • Created and optimized PySpark jobs in Databricks to process and transform large datasets for downstream analytics
  • Developed a scalable Azure-based data lakehouse architecture using Databricks Delta Lake to support ACID transactions, schema evolution, and time travel for reliable data versioning.
  • Implemented comprehensive data quality frameworks within Databricks to ensure accuracy, consistency, and integrity across both batch and streaming pipelines
  • Created detailed documentation for the entire data engineering process, ensuring easy knowledge transfer and onboarding for future projects.
  • Developed and conducted training sessions for team members, empowering them with the skills needed to effectively utilize the implemented data engineering solutions.
  • Collaborated in Agile development processes, including sprint planning, backlog grooming, and daily stand-ups to ensure the timely delivery of projects

Data Engineer

Modak Analytics, Abbvie
04.2022 - 09.2023
  • Designed and implemented a comprehensive Azure Data Factory solution for ETL processes, utilizing SQL, Data Factory pipelines, Databricks, and PySpark to handle the extraction, transformation, and loading of data from diverse sources.
  • Implemented security best practices by leveraging Azure Key Vault for storing and managing sensitive information such as credentials and secrets.
  • Incorporated Logic Apps to automate email notifications, enhancing system monitoring and alerting capabilities for timely issue resolution.
  • Developed and optimized ETL pipelines in Databricks using PySpark and SQL to process and store data
  • Developed robust batch processing workflows utilizing Delta Lake’s ACID transactions, schema enforcement, and time travel features to maintain high data quality and governance.
  • Designed scalable PySpark/SQL jobs to efficiently ingest, transform, and load data into Delta Lake, enabling seamless analytics and reporting.
  • Reduced ETL runtime by 50% through advanced partitioning, Spark tuning and file compaction
  • Managed CI/CD using Git and GitLab for seamless deployment of Databricks workflows and code
  • Collaborated with cross-functional teams to gather business requirements and translate them into effective data engineering solutions.

Education

Bachelor of Technology (B.Tech) - Information Technology

Gokaraju Rangaraju Institute of Engineering And Technology
Hyderabad
05.2022

Skills

  • Platforms:
  • Azure, Azure Databricks (ADB),Azure Data Lake Gen2 (ADLS), Azure SQL Data Warehouse (DWH), SnapLogic
  • Languages:
  • PySpark, Python, SQL
  • Technologies:
  • Apache Spark, Apache Iceberg, Apache Kafka, Delta Lake, Spark SQL, Parquet File Format
  • Cloud Services & Tools:
  • Azure Data Factory (ADF), Azure SQL, Azure Key Vault, Databricks Workflows, Linked Services, Integration Runtimes, Monitoring Pipelines, Triggers
  • DevOps & Version Control:
  • Git, GitLab, CI/CD Pipelines
  • Data Modeling:
  • Star Schema, Snowflake Schema, Data Warehousing, Stored Procedures, Views, Functions, Triggers, Data Flows

Certification

Databricks Certified Data Engineer Associate

Timeline

Data Engineer

Modak Analytics, Syngenta
10.2023 - Current

Data Engineer

Modak Analytics, Abbvie
04.2022 - 09.2023

Bachelor of Technology (B.Tech) - Information Technology

Gokaraju Rangaraju Institute of Engineering And Technology
Suraj Akula