Sandisk understands how people and businesses consume data and we relentlessly innovate to deliver solutions that enable today’s needs and tomorrow’s next big ideas. With a rich history of groundbreaking innovations in Flash and advanced memory technologies, our solutions have become the beating heart of the digital world we’re living in and that we have the power to shape.
Sandisk meets people and businesses at the intersection of their aspirations and the moment, enabling them to keep moving and pushing possibility forward. We do this through the balance of our powerhouse manufacturing capabilities and our industry-leading portfolio of products that are recognized globally for innovation, performance and quality.
Sandisk has two facilities recognized by the World Economic Forum as part of the Global Lighthouse Network for advanced 4IR innovations. These facilities were also recognized as Sustainability Lighthouses for breakthroughs in efficient operations. With our global reach, we ensure the global supply chain has access to the Flash memory it needs to keep our world moving forward.
Job DescriptionSandisk's High-Performance Computing environments are key to bringing new storage solutions to market. As a Senior High-Performance Computing (HPC) engineer in the IT Infrastructure team, you will be at the heart of Sandisk’s engineering and product development process, delivering the IT HPC infrastructure and services that empowers engineering teams to develop new storage technologies and deliver high quality products to market quickly.
As a member of the HPC as a service team – HPCaaS, you will be responsible for establishing and executing strategic objectives focused on improving the effective utilization of the compute resources while meeting or exceeding customer service level agreements for job prioritization, job concurrency, and job throughput in our EDA compute clusters. This includes leading architectural innovation and path finding efforts to create and implement Sandisk’s next generation Grid computing environment. As a member of the team, you will be expected to not only deliver on technical requirements and solutions but also be able to present your solutions to senior management. Responsibilities include but are not limited to working as an individual contributor, a team member and a technical team lead to explore, define, and pilot new solutions with little supervision. Develop solutions, scripts, and/or processes to automate management of services and tools as required. In this role, you will be collaborating closely with EDA and hardware design team stakeholders to define and deliver workload efficiency improvements in Sandisk’s EDA HPC infrastructure globally.
Role Overview:
Join our global engineering product development team to support and enhance multi-site, high-performance computing (HPC) infrastructure and services. You will design, implement, and maintain automation solutions while driving continuous improvements in performance and reliability.
Key Responsibilities:
Manage and support distributed HPC environments across multiple locations, focusing on ASIC and GPU computing clusters.Design, deploy, and maintain Ansible automation for HPC and Unix systems.Troubleshoot complex issues within HPC clusters and file systems, performing root cause analysis and driving corrective actions.Develop and maintain comprehensive documentation for HPC infrastructure.Identify opportunities to automate repetitive tasks and improve system reliability.Recommend and implement performance enhancements for various workloads.Support a broad Engineering Design Automation (EDA) ecosystem including licensing and workflow management.Technical Environment:
Workload managers: LSF, Slurm, NCEDA tools such as Cadence, Synopsys, and their workflowsAutomation of job submissions and workload managementMonitoring and observability using Splunk and GrafanaInfrastructure: RedHat/CentOS Linux, NFS storage, automountersVDI: Exceed TurboX, VNCUnix/Linux authentication integrated with Active DirectoryInfrastructure automation through scripting and open-source toolsQualificationsBachelor’s degree in Computer Science or equivalent experience10+ years of Linux systems administration, with strong expertise in RedHat/CentOS production environmentsProven experience with workload managers, especially LSF/Slurm/NCStrong automation skills, proficient in at least two scripting languages (shell/bash, Python)Demonstrated ability to lead technical projects through their full lifecycleExcellent problem-solving, multitasking, and troubleshooting abilities in complex environmentsOutstanding interpersonal, customer service, and team collaboration skills, with a results-driven mindsetAdditional InformationSandisk thrives on the power and potential of diversity. As a global company, we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees, our company, our customers, and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging, respect and contribution.
Sandisk is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us at [email protected] to advise us of your accommodation request. In your email, please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying.