Remote, MH, India
13 days ago
Sr. IT Monitoring Engineer/SRE (Remote)

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.

About the Role
The CrowdStrike Information Technology team is looking for a Senior IT Monitoring Engineer/Site Reliability Engineer (SRE) to lead the design, implementation, and evolution of our enterprise monitoring and observability platforms. In this leadership role, you will architect scalable monitoring solutions, drive reliability initiatives, and serve as a technical authority for monitoring best practices. You will mentor junior team members, collaborate with cross-functional teams to establish SLOs, and play a key role in major incident management. This position requires advanced technical expertise, strategic thinking, and the ability to balance operational excellence with innovation.

What You’ll Need:

Required Skills and Qualifications

8+ years of experience with enterprise monitoring platforms and observability tools (LogicMonitor, DataDog, LogScale, Zscaler Digital Experience (ZDX), ThousandEyes)

Advanced proficiency in multiple scripting/programming languages (Python, Go, Bash)

Expert knowledge of modern monitoring ecosystems (Prometheus, Grafana, ELK)

Demonstrated experience architecting monitoring solutions at scale across hybrid environments

Strong background in SRE practices, including SLO definition, error budgets, and reliability engineering

Advanced knowledge of cloud platforms (AWS, GCP) and their native monitoring capabilities

Expertise in log aggregation, metrics and KPIs collection, and distributed tracing implementations

Experience designing and implementing automated remediation systems

Strong understanding of Infrastructure as Code and GitOps principles

Proven ability to mentor junior engineers and provide technical leadership

Shift timings- 12PM -9PM IST

What You'll Do:

Technical Leadership

Architect and implement enterprise-wide monitoring and observability solutions

Establish monitoring standards, best practices, and governance frameworks

Lead the evaluation and adoption of new monitoring technologies and approaches

Design scalable, resilient monitoring Infrastructure as Code

Serve as the technical escalation point for complex monitoring issues


Reliability Engineering

Lead the implementation of SRE practices across the organization

Partner with service owners to define appropriate SLOs and error budgets

Drive reliability improvements through data-driven analysis and recommendations

Design and implement advanced alerting strategies

Develop comprehensive observability strategies covering metrics, logs, and traces


Incident Management

Lead major incident response for critical service disruptions

Conduct thorough post-incident reviews and drive systematic improvements

Establish incident management processes and tooling improvements

Mentor team members on effective incident response techniques

Analyze incident patterns to identify and address systemic issues


Strategic Initiatives

Develop the monitoring and observability roadmap aligned with business objectives

Lead monitoring platform migrations and major upgrades

Implement cost optimization strategies for monitoring infrastructure

Drive automation initiatives to reduce toil and improve operational efficiency

Collaborate with security teams to integrate security monitoring capabilities


Team Development

Mentor junior engineers on monitoring best practices and SRE principles

Provide technical guidance and code reviews for monitoring implementations

Create documentation and knowledge-sharing materials for the broader organization

Contribute to hiring and team development activities

Foster a culture of continuous improvement and learning

Bonus Points:

Advanced certifications in cloud platforms or SRE practices

Experience leading incident response for complex, high-impact service disruptions

Experience with AIOps and ML-based monitoring approaches

Background in performance engineering or capacity management

Experience with chaos engineering and resilience testing

Bachelor's or Master's degree in Computer Science, Engineering, or related field

#LI-DP1

#LI-VJ1

#LI-Remote

Benefits of Working at CrowdStrike:

Remote-friendly and flexible work culture

Market leader in compensation and equity awards

Comprehensive physical and mental wellness programs

Competitive vacation and holidays for recharge

Paid parental and adoption leaves

Professional development opportunities for all employees regardless of level or role

Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections

Vibrant office culture with world class amenities

Great Place to Work Certified™ across the globe

CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.

CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.

If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.

Por favor confirme su dirección de correo electrónico: Send Email