Job description
 
                         Job Role    : Sr Dev Ops – Observability and Monitoring     Experience    : 10+ Years
Location    : Mumbai (Onsite)
About the Role:
We are seeking an experienced   Senior Dev Ops Observability and Monitoring Lead   to design, implement, and manage comprehensive monitoring and observability solutions across our cloud and on-premise infrastructure.
The role focuses on ensuring   system reliability, performance, and proactive incident management   through advanced monitoring, alerting, and observability strategies.
Key Responsibilities:
Lead the design, deployment, and maintenance of   observability frameworks   across applications and infrastructure.
Implement and manage   monitoring, logging, tracing, and alerting solutions   using tools such as Prometheus, Grafana, ELK Stack, Datadog, Splunk, or equivalent.
Collaborate with development, QA, and operations teams to ensure   performance, availability, and reliability   of critical systems.
Define and enforce   best practices for monitoring, incident management, and observability   across the organization.
Develop dashboards, metrics, and reports to provide actionable insights to stakeholders.
Implement automated   alerting, anomaly detection, and root cause analysis   processes.
Optimize monitoring solutions for scalability, performance, and cost-efficiency.
Mentor junior engineers and promote a culture of proactive system health and observability.
Evaluate and recommend new tools and technologies to enhance observability and monitoring capabilities.
Key Skills and Qualifications:
10+ years of experience in Dev Ops, cloud infrastructure, and observability/monitoring roles.
Strong hands-on experience with   monitoring and observability tools   (Prometheus, Grafana, ELK Stack, Datadog, Splunk, New Relic).
Solid understanding of   cloud platforms   (AWS, Azure, GCP) and hybrid infrastructure.
Experience with   logging, tracing, and metrics collection   for large-scale distributed systems.
Strong scripting and automation skills (  Python, Bash, Power Shell  ) for monitoring and alerting workflows.
Knowledge of   CI/CD pipelines, containerization (Docker), and orchestration (Kubernetes)  is a plus.
Excellent problem-solving, leadership, and stakeholder management skills.
Proven experience in defining observability strategies and leading monitoring initiatives in enterprise environments.
 
                    
                    
Required Skill Profession
 
                     
                    
                    Other General