Data Center Production Operations Engineer
Meta
**Summary:**
Meta is seeking a forward-thinking experienced Engineer to join the Production Operations Engineering team within Infra Data Centers. Our data centers, and the hundreds of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Meta is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast paced, technical environment where adaptability and flexibility will be key to their success. We seek an IT professional with advanced, hands-on technical skills in server hardware and Linux (ideally in a Data Center environment). Having extensive knowledge of server administration and performing on complex projects in a large-scale distributed data center environment is a core competency of this individual. The candidate should also have practical knowledge and experience in at least one of the following core areas: Hardware, OS repair, Tooling and Automation and Project Management.
**Required Skills:**
Data Center Production Operations Engineer Responsibilities:
1. Perform dives and analyze complex technical issues within the data center, ranging from automated tooling to hardware failures, Linux OS, and network issues
2. Work as a subject matter expert with cross functional teams on large scale data center projects and initiatives
3. Provide cross data center support and identify potentially larger issues, displaying effective communication when something is identified
4. Work with internal hardware teams and vendors to help drive complex technical issues to resolution, provide an ownership stake in ensuring high quality levels of hardware, and influence future design to ensure ease of serviceability
5. Capacity to solve issues at scale using scripting, automation and tooling
6. Use data to drive maximum server fleet up-time and utilization rates, by understanding hardware failure rates and Service Level Agreements to customers. Identify trends and systemic issues in the fleet and drive resolution
7. Coach/Mentor team members to evaluate and identify better ways to resolve issues and define updates to tools and processes
8. Provide mentorship and be the go-to technical resource for management
9. Build cross functional relationships and have the experience to influence policies and procedures to improve global data center operations
10. Participate in an on-call rotation
11. Daily use of our ticketing system to support servers that are unavailable and need to be returned to capacity
**Minimum Qualifications:**
Minimum Qualifications:
12. BS, BA or BEng in technical field or commensurate experience
13. 5+ years of infrastructure or related experience
14. Knowledge of Linux and hardware systems support in an Internet operations environment
15. Experience managing multiple technical issues concurrently
16. Knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, network and server systems
17. Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
18. Time and project management experience
19. Experience in modifying and developing in commonly used scripting or programming languages
**Preferred Qualifications:**
Preferred Qualifications:
20. Experience with large-scale GPU based systems
21. Experience in debugging, modifying and developing in commonly used scripting or programming languages including Bash, PHP, Python, SQL, or Perl
22. Experience in a large-scale data center environment
23. Experience in providing technical guidance to external vendors
**Industry:** Internet
Por favor confirme su dirección de correo electrónico: Send Email