Site Reliability Engineer
KIBO
ABOUT THIS ROLE As an SRE, your primary responsibility is to ensure the reliability, scalability, and availability of the systems that power Kibo’s products and services. You will work closely with cross-functional teams to build and maintain these systems, and you will be responsible for monitoring them to proactively identify and address production issues. ABOUT KIBO KIBO is a composable digital commerce platform for B2C, D2C, and B2B organizations who want to simplify the complexity in their businesses and deliver modern customer experiences. KIBO is the only modular, modern commerce platform that supports experiences spanning B2B and B2C Commerce, Order Management, and Subscriptions. Companies like Ace Hardware, Zwilling, Jelly Belly, Nivel, and Honey Birdette trust Kibo to bring simplicity and sophistication to commerce operations and deliver experiences that drive value. KIBO's cutting-edge solution is MACH Alliance Certified and has been recognized by Forrester, Gartner, IDC, Internet Retailer, and TrustRadius. KIBO has been named a leader in The Forrester Wave™: Order Management Systems, Q1 2025 and in the IDC MarketScape report “Worldwide Enterprise Headless Digital Commerce Applications 2024 Vendor Assessment”. By joining KIBO, you will be part of a team of Kibonauts all over the world in a remote-friendly environment. Whether your job is to build, sell, or support KIBO’s commerce solutions, we tackle challenges together with the approach of trust, growth mindset, and customer obsession. If you’re seeking a unique challenge with amazing growth potential, then come work with us! WHAT YOU’LL DO Design, implement, and maintain cloud infrastructure and tooling to support software development, deployment, and operations. Develop and enhance monitoring and alerting systems to proactively detect and resolve issues, ensuring system reliability. Automate deployments, configurations, and testing to streamline administration and minimize operational risks. Troubleshoot and resolve performance, availability, and security issues across distributed systems. Lead post-mortems and root cause analyses to drive continuous improvement and prevent recurring incidents. Ensure high availability and system reliability while participating in a 24x7x365 on-call rotation to address critical incidents. Collaborate with engineering teams to build scalable, resilient, and secure infrastructure that meets customer needs.
Por favor confirme su dirección de correo electrónico: Send Email