About Resolver
Resolver (formerly Crisp) is a social media safety and crisis monitoring firm and our technology is used by some of the world’s largest, best known brands and social platforms to provide fast detection of critical issues and crises. We help companies by spotting fake news and rumours being spread about brands on social media, which can damage sales and impact share price.
We also protect users of social media from harmful and unwanted user generated content, utilising hundreds of analysts and moderators operating 24/7 and in over 50 languages.
The Role
This role sits within the Site Reliability Engineering team, part of the wider Resolver Technical Services business unit. Our SaaS platforms, distributed systems, and product integrations enable critical business operations and deliver industry-leading threat detection technology to our customers.
As an Associate Technical Operations Engineer, you will support the stability, availability, and performance of our SaaS application suite hosted on Google Cloud, AWS, and Azure. You will work closely with senior engineers and development teams to ensure operational excellence, maintain service reliability, and contribute to continuous improvement initiatives.
Key Responsibilities
Provide third-line support for infrastructure and applications within the SRE team.
Monitor and maintain cloud-based environments (AWS, Azure, GCP) to ensure high availability and resilience.
Respond to alerts from monitoring platforms and follow runbooks/SOPs for resolution.
Assist in incident management, including root cause analysis (RCA) and postmortem documentation for P1/P2 incidents.
Maintain and tune alerting thresholds to reduce false positives.
Contribute to the creation and updating of SOPs/runbooks for repeatable processes.
Support BAU activities, such as:
Elastic index snapshots
Security alert reviews
Daily volume alert triage
Reprocessing persistent failures
Collaborate with engineering teams to uphold Service Level Objectives (SLOs).
Adhere to ITIL practices and assist in problem management processes.
Essential Skills and Experience
Basic understanding of cloud platforms (AWS, Azure, GCP).
Strong troubleshooting and fault-finding skills.
Familiarity with monitoring solutions and alert management.
Knowledge of incident management processes.
Exposure to ITIL-based environments. Willingness to work in a 24/7 operational support model (rotational shifts).