United States
22 hours ago
Software Development Snr Manager

Oracle Cloud Infrastructure is the foundation for Oracle’s next-generation cloud services. Our mission is to build the most secure, high-performance, and cost-effective infrastructure for enterprise workloads and emerging technologies. Within OCI, the AI Platform, Services & Solutions organization is enabling Oracle’s bold vision for generative AI and large-scale machine learning.

As Senior Manager of Software Development, you will lead a team focused on accelerating Oracle’s AI transformation by building and operating large-scale internal GPU clusters, self-service ML infrastructure, and end-to-end model lifecycle capabilities including training, tuning, and serving. Help shape the core infrastructure powering Oracle’s generative AI and machine learning solutions. Tackle some of the most challenging problems in AI infrastructure at enterprise scale. Collaborate with world-class teams and leaders driving innovation in cloud and AI. Be part of a high-visibility initiative central to Oracle’s future.

This role requires strong technical and leadership skills, with a deep understanding of cloud-native infrastructure, distributed systems, and modern AI/ML workloads. You will collaborate across OCI and Oracle’s product teams to power internal and customer-facing AI solutions at scale.

Responsibilities Lead the development and operations of large internal GPU clusters supporting high-performance model training and inference. Build self-service capabilities for engineers and data scientists to manage GPU workloads, monitor usage, and streamline ML workflows. Design and deliver model lifecycle services—including training, fine-tuning, evaluation, and scalable model serving. Collaborate with internal science, product, research, and infrastructure teams to ensure AI workloads are optimized for performance, cost, and reliability. Ensure strong security, observability, and operational best practices across the platform. Mentor and grow a high-performing engineering team; establish a strong culture of technical excellence, accountability, and innovation. Drive strategic investments and architecture decisions in support of Oracle’s AI strategy. Stay abreast of emerging technologies, industry best practices, ensuring compliance and driving innovation within the organization. Coaching, mentoring, and developing best talent. Sets Goals and Expectations for performance and works with employees to establish specific, measurable goals and commitments.

Basic Qualifications:

3+ years of management experience in enterprise software 7+ years of experience with applications development with 2+ years in large scale distributed applications / web services/ systems design Proficient at programming in C/C++ (preferred), Rust (preferred), and Java Bachelors in computer science and Engineering or related engineering fields
 

Preferred Qualifications:

10+ years of experience in software engineering with 3+ years in a leadership or management role. Proven track record delivering scalable, production-grade systems in cloud infrastructure or AI/ML platforms. Deep knowledge of distributed computing, container orchestration (e.g., Kubernetes), and GPU-based infrastructure. Experience with model training pipelines, hyperparameter tuning, checkpointing, and model versioning at scale. Familiarity with ML frameworks (e.g., PyTorch, TensorFlow), model deployment strategies, and inference optimization. Strong problem-solving skills, data-driven decision-making, and ability to navigate complex cross-functional initiatives. Excellent written and verbal communication skills. Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field 

 

Por favor confirme su dirección de correo electrónico: Send Email