HPC SRE Engineer
Ford
We are seeking a highly skilled and motivated HPC (High Performance Computing) SRE Engineer to join our growing team. You will be responsible for monitoring and collecting metrics on our HPC infrastructure, triaging infrastructure issues from health monitoring alerts, and responding to customer incidents and tickets to ensure a high quality of service for our user community and our SLA's are met. This role will also focus on deploying and maintaining the metrics, logs, and monitoring stack we use verify the health of our systems and automating responses to system issues.
You'll have...
Associate's degree in Computer Science, Engineering, or work experience equivalent5+ years of experience in Systems or Software engineeringStrong understanding of Linux operating systems, preferably in an HPC environmentExperience with metrics collection tools Prometheus or ElasticsearchExperience building visualizations and alerts in tools like Grafana or KibanaProficiency programming in one or more languages, preferably go, python, or bash scripting. A self motivated attitude and be able to autonomously respond to alerts and fix issuesStrong communication and collaboration skills You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply!As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builder…or all of the above? No matter what you choose, we offer a work life that works for you, including: • Immediate medical, dental, and prescription drug coverage• Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more• Vehicle discount program for employees and family members, and management leases• Tuition assistance• Established and active employee resource groups• Paid time off for individual and team community service• A generous schedule of paid holidays, including the week between Christmas and New Year’s Day• Paid time off and the option to purchase additional vacation time.For a detailed look at our benefits, click here: Benefit Summary This role is based out of Dearborn MI, you will be required to be on-site 4 days/week. *Visa Sponsorship is NOT provided for this specific role**Relocation assistance is NOT provided for this specific role* Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire.We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call 1-888-336-0660. #LI-Remote#LI-DS2What you'll do…
Implement monitoring solutions to ensure the health and availability of critical infrastructure and applications.Collect metrics on system performance, service availability, and user experience.Respond to infrastructure alerts and user community tickets to resolve issues that may impact business continuity or missing our SLA targets.Build automation to restore health to hardware systems that have had failures.Develop and maintain documentation for software and procedures.Stay up-to-date on the latest advancements in HPC technologies and best practices.
Por favor confirme su dirección de correo electrónico: Send Email