Results-driven and detail-oriented Data Engineer with two and a half years of experience, skilled in designing and implementing robust data pipelines, performing complex data migrations, and managing data integration across diverse platforms such as cloud environments and on-premises systems. Proficient in PySpark, SQL, and Databricks for efficient data transformation and processing. Adept at collaborating with cross-functional teams and clients to deliver high-quality solutions, streamline workflows, and resolve production and testing issues. Strong background in developing configuration-driven automation, conducting unit and reconciliation testing, and enabling knowledge transfer to ensure smooth project execution.
Designed and implemented scalable ETL pipelines using PySpark, focusing on performance optimization and reusability.
Worked across multiple cloud ecosystems including AWS (Redshift, Glue, Lambda, DynamoDB) and Azure (Databricks, Data Lake, Synapse) to enable seamless data migration and integration.
Ingested data from a wide range of sources such as relational databases, flat files, XML/DDE formats, and enterprise applications like Salesforce.
Developed transformation logic across Bronze, Silver, and Gold layers, supporting business intelligence and analytics use cases.
Ensured high data integrity by developing unit testing frameworks, writing validation scripts, and performing automated reconciliation checks.
Conducted historical and incremental load validation, identifying data mismatches and collaborating on root cause analysis (RCA).
Created robust QA configurations for various file formats, helping to streamline testing processes and reduce manual effort.
Hands-on experience with Databricks for developing and orchestrating data workflows, implementing notebook-based pipeline executions.
Leveraged Control-M and native cloud schedulers to manage workflow dependencies and monitor performance.
Built reusable ingestion frameworks for structured and semi-structured data formats.
Actively supported end-to-end data pipeline deployments across SIT, pre-production, and production environments.
Collaborated on environment setup, migration planning, and workflow validation, ensuring stability and accuracy in production releases.
Monitored pipeline execution and resolved post-migration and runtime issues through logs and alert-driven debugging.
Maintained detailed documentation of data models, pipeline configurations, and source system inventories to improve onboarding and collaboration.
Developed structured templates for mapping source data to pipeline layers, enabling better visibility for development and QA teams.
Contributed to process improvements by suggesting standardization of configurations and documentation practices.
Interfaced with client teams, architects, and QA stakeholders to clarify data requirements, resolve defects, and align on delivery timelines.
Proactively communicated environment-specific challenges, such as data discrepancies and access issues, with suggested mitigation steps.
Sent detailed and structured email updates to clients for file sourcing, testing status, and ingestion scope clarifications.
Conducted knowledge transfer sessions for new team members, covering project background, tooling, and best practices in ETL and cloud data engineering.
Shared insights and techniques on Databricks development, notebook management, PySpark optimization, and error-handling mechanisms.
Acted as a point of contact for onboarding queries and provided functional walkthroughs to ensure effective team contributions.
TECHNICAL SKILLS:
PROGRAMMING & SCRIPTING:
CLOUD PLATFORMS:
OTHERS:
INTERPERSONAL SKILLS: