Oracle
Our team is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These are essential for running distributed AI/ML/HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband.
Why Join Us?
Innovative Projects: Build groundbreaking solutions for our customers from the ground up.
Exciting Times: Be part of a young, fast-growing team working on ambitious new initiatives.
Dynamic Environment: Collaborate in a vibrant, agile environment where learning and adaptability are key.
What We’re Looking For:
Adaptable Engineers: Self-motivated individuals with a quick learning ability.
Technical Excellence: Rock-solid developers and distributed systems engineers with a deep understanding of distributed systems and algorithms. Comfortable diving deep into any part of the stack, as well as software debugging and low-level systems troubleshooting.
Passion for Simplicity and Scale: Value simplicity and scalability in design and implementation.
Collaborative Spirit: Comfortable working in a collaborative, agile environment and eager to learn. Ability to collaborate effectively with various dependencies, including Network and Data Center operations.
Join us and be a part of the team that's pushing the boundaries of AI technology!