Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
Generic
Prasanna srinivas Nandipati

Prasanna srinivas Nandipati

HPC engineer
Hyderabad

Summary

Dynamic HPC Engineer with a proven track record at Cloud Vertex Technologies, adept at optimizing workflows and enhancing system performance. Skilled in AWS services and Linux administration, I excel in proactive problem-solving and fostering stakeholder relationships, ensuring project success and minimal downtime. Committed to delivering innovative solutions that drive efficiency.

Overview

8
8
years of professional experience
1
1
Certification

Work History

HPC Engineer

Cloud Vertex Technologies India Private Limited
05.2024 - Current
  • Achieved successful project outcomes by maintaining accurate documentation and meeting strict deadlines.
  • Optimized engineering processes by implementing innovative solutions and streamlining workflow.
  • Implemented AWS solution to simply file transfer from a remote SFTP to S3
  • Successfully updated the Nextflow Application to latest version by taking into account the breaking changes and warnings
  • Enabled the user access to AWS SageMaker and helped to run ML Pipelines
  • Collaborated with various teams to reduce the application availability downtime
  • Helped end user to automate ccp4 application job submission by creating a custom python script to parse input from excel and submit the job
  • Used AWS Codepipeline to deploy AWS Parallel Cluster
  • Used AWS Cloudformation to deploy AWS PCS (parallel cluster service)
  • Enabled external vendors to access internal S3 by applying suitable policies
  • Implemented AWS solution to automatically stop/start EC2 based on a particular schedule/period
  • Successfully troubleshooted and resolved various application issues


Specialist

HCL Technologies
11.2021 - 05.2024
  • Collaborated with cross-functional teams to achieve project goals on time and within budget.
  • Improved customer satisfaction rates through proactive problem-solving and efficient complaint resolution.
  • Followed all company policies and procedures to deliver quality work.
  • Apply security patches using BCM (Bright Cluster Manager) to the entire cluster by updating and pushing patched software images.
  • Renewal and verification of BCM licence.
  • Monitor Application Licence usage using LSF RTM.
  • Update Matlab and Mathematica, StarCCM licence manager.
  • Installing Scientific Simulation applications like Mathematica, Monolix, Comsol, Matlab, StarCCM through EasyBuild.
  • Involved in Upgrade of Slurm Packages using BCM.
  • Managing Slurm queues and partitions.
  • Generating Slurm Cluster usage reports and Job efficiency reports using “sacct”.
  • Update GPFS Client packages post security patching.
  • Adding and Removing nodes to and from GPFS Cluster.
  • Install and configure splunk forwarder to view node health metrics.
  • Mitigate Security Vulnerabilities as per the Qualys scan recommendations.


Cloud Engineer

Cloud Vertex Technologies India Private Limited
06.2020 - 11.2021
  • Used metrics to monitor application and infrastructure performance.
  • Reduced server downtime by proactively monitoring cloud resources and addressing potential issues before they escalated.
  • Identified, analyzed and resolved infrastructure vulnerabilities and application deployment issues.
  • Assisted in migration projects from on-premises data centers to cloud environments, ensuring minimal disruption to business operations.
  • Enhanced cloud infrastructure efficiency by implementing advanced automation techniques and tools.
  • Provided technical support to internal stakeholders, diagnosing and resolving complex issues related to the organization''s cloud environment.
  • Evaluated new cloud technologies and recommended solutions that aligned with organizational goals and objectives.
  • Monitoring the status of HPC jobs.
  • Executing HPC Operational tasks assigned via Service now ticketing tool.
  • Deploying On-Demand HPC Clusters using AWS Parallel cluster service.
  • Deploying HPC Applications and Managing them.
  • Responsible for applying Linux and Windows security Patches periodically on AWS EC2 instances.
  • Monitoring health of Cloud Resources using NewRelic and CloudWatch.
  • Creation and Maintenance of custom Python script to dynamically add AWS EBS Volumes to compute nodes while HPC Job Submission.
  • Involved in Data transfer validation from AWS S3 source to AWS S3 destination and from local mount point source to AWS S3 destination using custom python script and shell script.

HPC System Administrator

Concept Information Technologies India Pvt. Ltd.
01.2019 - 06.2020
  • Reduced downtime by proactively identifying and resolving potential issues through thorough system monitoring.
  • Established effective communication channels between IT support staff and end-users, leading to improved issue resolution times overall.
  • Simplified troubleshooting processes by creating detailed documentation for system configurations, procedures, and best practices.
  • Checking the status of MPI, LSF, Slurm Parallel jobs.
  • Troubleshooting HPC job errors related to Applications like Ansys Fluent, Nfasant, Altair Feko.
  • Monitoring the SAN Storage health using Lenovo Think System Storage Manager..
  • Verifying that the Lustre parallel file system is mounted and accessible in all nodes.
  • Creation of Lustre file system Quota for users.
  • Monitoring and Managing Cluster DataBackup using Veritas Backup Exec.
  • Created a Custom Shell script to Check Lustre file system Mount points and mount them if not mounted.
  • Created a Custom Shell script to shut down cluster in Sequential order, using temperature as trigger point when temperature increases to threshold value, due to Power loss.

Linux System Administrator

DHII Health Tech Pvt. Ltd
06.2017 - 01.2019
  • Implemented monitoring tools for real-time analysis of system performance, allowing for proactive identification of potential issues before they impacted users.
  • Improved system performance by optimizing Linux server configurations, implementing efficient backup processes, and performing regular maintenance tasks.
  • Provided hands-on support for end-users experiencing issues with Linux-based systems or applications, facilitating quick resolutions and minimal disruptions to productivity.
  • Proactively monitor and manage all Compute, Network, Storage infrastructure to attain 24x7 Availability.
  • Checking the status of LSF, PBS Parallel jobs.
  • Created and serviced administrator and user accounts on Linux-based systems.
  • Made sure that the IBM GPFS parallel file system is Mounted and accessible in all nodes.
  • Troubleshoot Hardware, OS level issues.
  • Modified LSF queues according to the requirement.
  • Monitored Cluster Health using Ganglia Monitoring tool.
  • Managing Cluster using Cluster Managing tool IBM IMM, HP CMU.

Education

Master of Technology - Embedded Systems

Sreyas Institute of Engineering And Technology
Hyderabad, India
04.2001 -

Bachelor of Technology - Electronics And Communications Engineering

Guru Nanak Institute of Technology
Hyderabad, India
04.2001 -

Board of Intermediate Education - Mathematics Physics Chemistry

Sri Chaitanya Junior College
Hyderabad, India
04.2001 -

Board of Secondary Education - English Mathematics Physics Chemistry

Sri Krishnaveni Talent School
Hyderabad, India
04.2001 -

Skills

Linux

AWS VPC, EC2, EBS, S3, IAM, BATCH, Lambda, SageMaker, FSx for Lustre, Cloudformation, AWS Parallel Cluster

Docker

Slurm

CI/CD

MLOps

Bash, Python Scripting

Slurm, LSF,PBS

Accomplishments

  • Successfully developed BASH script to compare files in FSx for lustre and corresponding S3, and collect files present only in Lustre and not in S3
  • Achieved Cost Optimization by introducing AWS instance scheduler solution for automating start/stop EC2.
  • Simplified file transfer from external sources to internal S3
  • Created a Custom Shell script to shut down cluster in Sequential order, using temperature as trigger point when temperature increases to threshold value, due to Power loss.
  • Automate an application job submission using custom python script

Certification

RHCSA

Timeline

HPC Engineer

Cloud Vertex Technologies India Private Limited
05.2024 - Current

Specialist

HCL Technologies
11.2021 - 05.2024

Cloud Engineer

Cloud Vertex Technologies India Private Limited
06.2020 - 11.2021

HPC System Administrator

Concept Information Technologies India Pvt. Ltd.
01.2019 - 06.2020

Linux System Administrator

DHII Health Tech Pvt. Ltd
06.2017 - 01.2019

Master of Technology - Embedded Systems

Sreyas Institute of Engineering And Technology
04.2001 -

Bachelor of Technology - Electronics And Communications Engineering

Guru Nanak Institute of Technology
04.2001 -

Board of Intermediate Education - Mathematics Physics Chemistry

Sri Chaitanya Junior College
04.2001 -

Board of Secondary Education - English Mathematics Physics Chemistry

Sri Krishnaveni Talent School
04.2001 -
Prasanna srinivas NandipatiHPC engineer