Hungry, Humble, Honest, with Heart.
The Opportunity
We are reimagining observability at Nutanix with Panacea.ai, our next-gen AI-driven log and metrics analyzer. In version 1.0, we leveraged regex-based filters to surface anomalies. Now, we’re building Panacea.ai—powered by AI/ML, ModernBERT, and LLMs—to deliver intelligent, context-rich anomaly detection, automated root cause analysis (Auto-RCA), and continuous learning from user feedback.As a Staff Engineer (MTS-6), you will own the architecture and AI/ML systems that power both log and metrics analysis, enabling automated diagnostics and reducing triage time for QA failures, regression runs, and customer issues. You’ll also help define and drive the central AI charter at Nutanix, building reusable components, model infrastructure, and scalable ML services.
About the Team
The Panacea team has a passionate set of engineers across India and US office. We move fast, collaborate closely, and care deeply about quality and ownership. Our mission is to deliver AI/ML-powered developer productivity tools that solve real engineering and support pain points at scale.
Why Join Us
Your Role
AI-Powered Observability Platform: Own the vision, architecture, and delivery of Panacea’s ML-based log and metrics analyzer that reduces triage time and improves engineering efficiency.AI/ML-powered Log Analyzer Tool: Use deep learning (e.g., ModernBERT) to represent log messages, detect anomalies, group patterns, and surface actionable insights to users.Metrics Anomaly Detection Engine: Build robust ML models to detect anomalies in time-series metrics like CPU, memory, disk I/O, network traffic, service health, and more—automatically identifying performance degradation or system regressions across distributed environments.Auto-RCA Engine: Combine log and metrics signals with graph-based correlation and LLM-powered summarization to automatically diagnose the root cause of system failures.Feedback Loop & Continuous Learning: Build infrastructure for incorporating user feedback to continuously retrain and improve anomaly detection systems.LLM Integration: Integrate LLMs for user queries, problem summarization, anomaly explanation, and contextual recommendations.Central AI Charter: Contribute to Nutanix’s foundational AI platform by defining shared tooling, datasets, governance, and reusable ML components across products.Responsibilities
What You Will Bring
Educational Background: B.Tech/M.Tech in Computer Science, Machine Learning, AI, or related fields.Experience: 12+ years of engineering experience , including designing , developing and deploying AI/ML systems at scale.ML Expertise:Strong in time-series anomaly detection, statistical modeling, supervised/unsupervised learning.Experience building ML models for metrics data (CPU, memory, IOPS, network, etc.) using models like Isolation Forest, Prophet, LSTM, or deep autoencoders.Expertise in NLP using ModernBERT, BERT, or log classification, clustering, and summarization.Experience with LLMs for downstream tasks like summarization, root cause reasoning, or intelligent Q&A.Engineering Skills: Strong Python background, hands-on with ML libraries (PyTorch, TensorFlow, Scikit-learn), time-series frameworks, and MLOps tools. Familiar with data pipelines and serving models.Observability Knowledge: Hands-on with logs, metrics, traces, and popular monitoring tools (e.g., Prometheus, Grafana, ELK).Leadership: Ability to independently drive projects from requirements to delivery, mentor junior engineers, and deliver business impact.Work Arrangement
Hybrid: This role operates in a hybrid capacity, blending the benefits of remote work with the advantages of in-person collaboration. For most roles, that will mean coming into an office a minimum of 2 - 3 days per week, however certain roles and/or teams may require more frequent in-office presence. Additional team-specific guidance and norms will be provided by your manager.
--