As a Site Reliability Engineer (SRE) at Ping Identity, you’ll play a critical role in the architecture, deployment, and reliability of one of the largest identity platforms in the world. Embedded directly with our development teams and operating under a DevOps model, you’ll be involved in the entire lifecycle of our SaaS offerings—from design and deployment to operations and continuous improvement.
You'll join our Cloud Operations team, where the focus is on building and automating resilient infrastructure. You’ll help define what operational excellence looks like—designing systems that are scalable, redundant, and observable.
You will:
Design, build, and maintain production infrastructure on AWS, using infrastructure-as-code principles. Develop and manage deployment pipelines for global infrastructure. Investigate and resolve complex performance and application issues across distributed systems. Build and maintain observability stacks including dashboards, alerts, and runbooks. Administer and evolve modern infrastructure tooling (e.g., Terraform, Puppet, Jenkins). Manage and automate Linux-based systems, including configuration and troubleshooting. Plan for infrastructure capacity, manage traffic routing, and enforce security policies to support Ping’s Single Sign-On (SSO) SaaS platform. Participate in a follow-the-sun on-call rotation to ensure platform reliability.You have:
Degree in Software Engineering or a related technical field. 3+ years of experience in SRE, DevOps, or Cloud Engineering roles with a strong understanding of delivering production-grade software. Solid experience with Amazon Web Services (AWS). Proven hands-on expertise with infrastructure provisioning tools like CloudFormation and Terraform. Proficiency in Docker and container orchestration frameworks such as Kubernetes. Experience with configuration management tools such as Puppet, Chef, or Salt. Skilled in Git workflows (branching, pull requests, merges) in team environments. Familiarity with observability platforms like New Relic, Grafana, and CloudWatch. Experience with CI/CD and automation tools like Jenkins and Artifactory. Strong Linux/UNIX administration skills, including troubleshooting and scripting. Solid grasp of networking principles in cloud-based environments. Understanding of security best practices and their implementation in production systems. Proficiency in at least one scripting or programming language (Python, Bash, Ruby, Go, etc.). Experience supporting high-traffic or mission-critical production services.You have an advantage if:
A background in software development or engineering.Strong understanding of the Software Development Life Cycle (SDLC) and how reliability fits within it.While this is a core Site Reliability Engineering role, we’re especially interested in candidates who also bring a software engineering mindset. If you enjoy writing clean, maintainable code to solve infrastructure problems—or you’ve worked closely with dev teams on service reliability—you’ll thrive here.
USA: $93,000 to $111,683
In accordance with Colorado’s Equal Pay for Equal Work Act (SB 19-085) the approximate compensation range for this role in Colorado is listed above. Final compensation for this role will be determined by various factors, such as knowledge, skills, and abilities.