Job description
 
                         **Hungry, Humble, Honest, with Heart.**  
**The Opportunity**  
At Nutanix, we’re redefining intelligent observability with  Panacea.ai  - an AI/ML-powered platform that automatically detects, explains and correlates anomalies across logs and metrics.
In version 1.0, we used regex-based filters along with historical data to identify anomalies.
In version 2.0, we advanced to  **AI/ML**  and capabilities that deliver deeper, context-rich anomaly detection and are working on building an  **enterprise-grade automated RCA (Root Cause Analysis) engine**  - powered by an  **agentic platform**  that integrates  **MCP servers**  as tools, and a conversational interface that enables users to query, explore and discuss issues as naturally as a chat.
We’re seeking a passionate and driven Engineer to work on this mission-critical initiative.
This is a hands-on, high-impact role where you will work on this AI/ML innovation, and shape Nutanix’s central AI charter.
You’ll be at the forefront of building enterprise-scale, AI-first observability solutions.  
**About the Team**  
The  **Panacea**  team has a passionate set of engineers across India and US office.
We move fast, collaborate closely, and care deeply about quality and ownership.
Our mission is to deliver  **AI/ML-powered developer productivity tools**  that solve real engineering and support pain points at scale.  
Why Join Us  
+ Work along with  **high-impact team**  delivering AI-first observability tools that directly improve engineering velocity and product quality.
+ Tackle  **challenging technical and product problems**  at scale and speed.
+ Shape the  **foundational AI platform and practices**  across Nutanix.
+ Enjoy the  **flexibility of hybrid work** , with a culture that values deep work, collaboration, and ownership.
+ Be part of a  **startup-style team**  backed by the scale, reach, and stability of a global cloud leader.  
**Your Role**  
+  **Auto RCA Engine:**  Deliver an AI-driven engine that correlates logs and metrics across distributed services, automatically surfacing explanations for incidents.
This includes an  **agentic platform**  that integrates  **MCP servers**  as tools, alongside a  **chat-like conversational interface**  that enables engineers to query issues, run diagnostics, and collaborate on RCA in natural language.
Here, LLMs will power interactive diagnostics and human-like discussions around problem-solving.
+  **AI-Powered Observability Platform:**  Own the vision, architecture, and delivery of Panacea’s ML-based log and metrics analyzer that reduces triage time and improves engineering efficiency.
This includes leveraging LLMs for anomaly explanation, RCA summaries, and contextual recommendations to engineers and support teams.
+  **Knowledge Base Creation:**  Build a robust  **company-wide knowledge base**  that consolidates product, observability, and system data into structured formats.
This knowledge base will serve as a foundation for LLMs, enhancing their ability to reason, answer queries, and provide deeper insights into system anomalies.
+  **Metrics Anomaly Detection:**  Development of models that detect anomalies in  **CPU, memory, disk I/O, network traffic, and service health** , enabling proactive identification of performance regressions.
LLMs will assist in summarizing anomalies and providing contextual recommendations for remediation.
+  **Feedback Loop & Continuous Learning:**  Build infrastructure that captures user interactions and feedback, using LLMs and ML pipelines to retrain and improve anomaly detection and RCA accuracy over time.
+  **Central AI Charter:**  Collaborate with product and support teams to define foundational AI infrastructure, shared ML components, governance practices, and standards that scale across Nutanix’s product ecosystem.  
Responsibilities  
+ Apply expertise in  **LangGraph, LangChain, agentic AI architectures, and multi-agent orchestration**  to build intelligent, scalable workflows.
+ Design and develop  **conversational AI systems**  (chatbots, copilots, or support assistants) for incident triage and RCA.
+ Implement correlation models to connect anomalies across logs and metrics, forming a cohesive  **RCA narrative** .
+  **End-to-end ML lifecycle** : data ingestion, feature extraction, model training, evaluation, deployment, and monitoring.
+ Build  **explainable AI systems**  that increase adoption and trust within engineering, QA, and support teams.
+ Collaborate with cross-functional stakeholders (SRE, QA, Dev) to deeply understand pain points and translate them into intelligent tooling.  
**What You Will Bring**  
+  **Educational Background** : B.Tech/M.Tech in Computer Science, Machine Learning, AI, or a related field.
+  **Experience** : 6+ years in software engineering, with a track record of designing, developing, and deploying AI/ML systems at scale,
+  **AI/ML Expertise** :
+ Strong in time-series anomaly detection, statistical modeling, supervised/unsupervised learning.
+ Experience building ML models for metrics data (CPU, memory, IOPS, network, etc.) using models like Isolation Forest, Prophet, LSTM, or deep autoencoders.
+ Experience with LLMs for downstream tasks like summarization, root cause reasoning, or intelligent Q&A.
+ Preferred experience in designing and deploying agentic workflows.
+  **Engineering Skills:**  Strong Python programming skills with proficiency in ML libraries (PyTorch, TensorFlow, Scikit-learn), time-series frameworks, and MLOps tools.
Experience building and operating robust data pipelines and serving models at scale.Observability Knowledge: Familiarity with logs, metrics, and traces, along with monitoring tools such as Prometheus, Grafana, and ELK  
**Work Arrangement**  
Hybrid: This role operates in a hybrid capacity, blending the benefits of remote work with the advantages of in-person collaboration.
For most roles, that will mean coming into an office a minimum of 3 days per week, however certain roles and/or teams may require more frequent in-office presence.
Additional team-specific guidance and norms will be provided by your manager.
We're an Equal Opportunity Employer Nutanix is an Equal Employment Opportunity and (in the U.S.) an Affirmative Action employer.
Qualified applicants are considered for employment opportunities without regard to race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, marital status, protected veteran status, disability status or any other category protected by applicable law.
We hire and promote individuals solely on the basis of qualifications for the job to be filled.
We strive to foster an inclusive working environment that enables all our Nutants to be themselves and to do great work in a safe and welcoming environment, free of unlawful discrimination, intimidation or harassment.
As part of this commitment, we will ensure that persons with disabilities are provided reasonable accommodations.
If you need a reasonable accommodation, please let us know by contacting CandidateAccommodationRequests@nutanix.com. 
 
                    
                    
Required Skill Profession
 
                     
                    
                    Other General