Manage a team that develops and operates virtual cloud network dataplane service
The Oracle Cloud Infrastructure (OCI) team can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. OCI is committed to providing the best in cloud products that meet the needs of our customers who are tackling some of the world’s biggest challenges.
Oracle’s Cloud Infrastructure (OCI) team is a new ground-up effort to build Infrastructure-as-a-Service that operates at a high scale in a broadly distributed multi-tenant cloud environment. Our customers run their businesses on our cloud, and our mission is to provide them with best-in-class compute, storage, networking, database, security, and an ever-expanding set of foundational cloud-based services. These are exciting times in our space - we are growing fast, still at a relatively early stage and working on ambitious new initiatives.
We’re looking for a Software Development Manager with expertise and passion in building teams, coaching individuals, and solving difficult problems in distributed systems, virtualized network infrastructure, and highly available services. These are exciting times in our space - we are growing fast, still at an early stage, and working on ambitious new initiatives within the software assurance and security areas.
As a Software Development Manager, you and your team ( direct or dotted line) will solve exciting technical challenges by analyzing, troubleshooting, and designing vital Oracle Cloud services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance. You will focus on engineering improvements to the systems that will eliminate whole classes of issues.
What You'll Do
Service Accountability –You will lead a team whose mission is to build core network infrastructure that is highly performant, scalable, and reliable. Ownership Scope – You own the performance of the service as it relates to the the networking dataplane as well as the roadmap to determine the key features required by our customers and partner teams. Operations Engineering – You will understand and be able to communicate the scale, capacity, security, performance attributes, and requirements of the service(s) you own. We are subject matter experts, able to understand and communicate every characteristic of our service stack, such as: degradation and behavior under load of the services and their dependencies Instrumentation and metrics that clearly describe the service behaviors scaling requirements and patterns resiliency and recoverability, ensuring that backup/restore and disaster recovery capabilities are implemented, tested, and maintained Automation – You will have a clear understanding of automation and orchestration principles, and will be eager to help automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt. Automation must be part of your DNA. Incidents Management - You will own and respond to customer incidents within the agreed-upon SLAs. You will manage the on-call for your services. Technical Experts - You will have a deep understanding of service topology and the dependencies required to troubleshoot issues and define mitigations. You will bring this expertise to bear in driving reliability improvements in the services you engage with. Cross-team collaboration – You will engage with and present to a wide variety of audiences, ranging from individual contributors and teams to senior leadershipBasic Qualifications:
BS degree in Computer Science or related technical field 5+ years of Software Engineer experience. 4+ years experience of leading or managing a team of software engineers building network infrastructure. Led or managed teams working on networking dataplane. Hands-on ability to Architect, design, and oversee development & implementation of features Experience with C/C++ and Python. Experience with LIinux Kernel Modules and device drivers. Strong Network Protocols, tools and troubleshooting experience. Debugging with tools such as gdb and Coredump analysis. Unit test, Feature test and system test planning, execution and automation. Resolving customer issues and handling escalations. Experience deploying code within change management procedures Understanding of service KPI metrics, alarms, logging, and system health Experience working in an operational environment with mission-critical tier-one services with associated oncall responsibilities. Experience with mentoring and growing engineers building network infrastructure.Preferred Qualifications:
Experience with resiliency design and operation Experience and understanding of security and compliance Experience managing large fleets Experience with performance tuning and optimization Expertise in Automation methodologies.