NOTE: Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.
The Meraki cloud supports millions of customer devices from 10 data centers and numerous public cloud regions from around the world. Meraki’s customer base has grown by a factor of 2-3 every year, serving tens of billions of HTTP requests per day globally. Our customers depend on our products to run their critical infrastructure of network switches, security appliances, wireless APs, cameras, sensors and more.
As SREs at Meraki, we are responsible for building and growing the cloud that supports these customers and their networks. As a Site Reliability Engineering Technical Leader on the Performance team you’ll take the lead on ensuring the speed, reliability and efficiency of the Meraki cloud. You’ll work across engineering teams, applying expertise in systems performance, distributed systems and applied observability to find and fix performance problems, and enable others to do the same.
We're a team of passionate software engineers that value quality and customer experience. Our team is based in the US, EMEA and APAC.
Responsibilities and Projects include:Analyze and improve the performance of production systems, including APIs, backend services, storage, databases, and infrastructure components.
Design and implement performance monitoring, benchmarking and profiling strategies across services and platforms.
Collaborate with development and SRE teams to troubleshoot performance regressions and scalability bottlenecks.
Lead performance-focused chaos testing, capacity planning, and load testing initiatives.
Build tools and dashboards to surface latency, throughput, and system utilization metrics across the stack.
Participate in design reviews to ensure systems are built with performance and scalability in mind from day one.
Drive incident response and root cause analysis for performance-related incidents in production.
Mentor other engineers on performance best practices, observability, and system tuning.
You are an ideal candidate if:You have 5+ years of experience in SRE, DevOps, or infrastructure-focused engineering roles with a strong performance focus.
You are proficient in programming with at least one common programming language (e.g. Ruby, Python, Go, Rust), and at least one scripting/automation language or tool (e.g. Bash, Ansible)
You have deep understanding of Linux internals, networking, filesystems, memory management, and containers.
You are comfortable with complex, large-scale distributed systems including components like container orchestration, load balancers, databases, and storage systems.
You have hands-on experience with monitoring and observability tools (e.g. Prometheus, Grafana, OpenTelemetry, eBPF, Flamegraphs), and can work with other teams to apply them to better monitor their services.
You have experience designing and running load tests using tools like Locust, k6, Gatling, or JMeter.
Want to work on a highly autonomous team that cares deeply about quality and customer experience.
Are curious, learn fast and feel comfortable diving into unfamiliar code and systems to solve problems.
Are willing to be part of a production on-call rotation.