Key Responsibilities
 - Toolchain Evaluation & Modernization
 - Evaluate legacy monitoring and alerting tools (e.g., BMC MainView, SolarWinds).
- Recommend and integrate a unified observability stack using Splunk, Dynatrace, Grafana, and Elastic Stack.
- Ensure end-to-end visibility across infrastructure, apps, and user experience.
- AIOps Enablement
 - Deploy AIOps capabilities (event correlation, noise reduction, predictive analytics) using Dynatrace and Splunk.
- Enable intelligent alerting and root cause analysis using ML-based models.
- Integrate ServiceNow ITOM for automated incident creation and enrichment.
- Automation & Self-Healing
 - Develop automation playbooks and runbooks (Python, PowerShell, Ansible) for common incident types.
- Enable auto-remediation pipelines linked to AIOps events.
- Support auto-scaling, service restarts, and config drift corrections.
- Observability Architecture & Implementation
 - Deploy logs, metrics, traces using Elastic Stack and Dynatrace.
- Define and implement Service Level Objectives (SLOs), error budgets, MTTR/MTTD benchmarks.
- Build dashboards in Grafana, Dynatrace, and ServiceNow Performance Analytics.
- Operational Process Reengineering
 - Redesign and automate event, incident, change, and problem management processes.
- Align monitoring workflows with ServiceNow CMDB and CI health status.
Shift operations from reactive to proactive, leveraging predictive insights
Requirements
Qualifications
 -  Education:
 - Bachelor's in Information Technology, Engineering, or Computer Science
- Master’s degree (optional but preferred)
-  Experience:
 - 8–12 years in IT operations, observability, or monitoring architecture
- 3–5 years hands-on in AIOps and automation
- Strong background in Dynatrace, Splunk, SolarWinds, ServiceNow, Elastic, BMC tools
-  Core Competencies:
 - Observability architecture and integration
- AIOps platforms and automation frameworks
- ITOM/ITSM best practices (especially ServiceNow ITOM modules)
- Scripting and tooling orchestration
- Metrics design: MTTR, Uptime, Alert Fatigue Index
-  Certifications (Preferred):
 - ITIL 4 Managing Professional
- Dynatrace Associate/Professional
- Splunk Core Certified Admin
- DevOps / SRE Foundation