Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Lead Site Reliability Engineer (ServiceNow Platform).
India Jobs Expertini

Urgent! Lead Site Reliability Engineer (ServiceNow Platform) Position in Hyderabad - VREZOLV PARTNERS PRIVATE LIMITED

Lead Site Reliability Engineer (ServiceNow Platform)



Job description

<h3 style="margin-top:19px; margin-bottom:19px"><span style="font-size:14pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><span style="color:#0f4761"><span style="font-weight:normal"><b><span lang="EN-US" style="font-size:12.0pt"><span style="line-height:116%">Lead Site Reliability Engineer (ServiceNow Platform)</span></span></b></span></span></span></span></span></h3> <h4 style="margin-top:21px; margin-bottom:21px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><span style="color:#0f4761"><span style="font-weight:normal"><span style="font-style:italic"><b>What you get to do in this role:</b></span></span></span></span></span></span></h4> <p style="margin-top:16px; margin-bottom:16px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">As the <b>Lead Site Reliability Engineer (SRE)</b>, you will spearhead the design and implementation of observability and reliability strategies across our ServiceNow platform and integrated third-party systems.

You'll lead the charge in establishing and maturing telemetry frameworks, ensuring the visibility of golden signals-<b>latency, traffic, errors, and saturation</b>-to drive proactive performance and availability management.</span></span></span></p> <p style="margin-top:16px; margin-bottom:16px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">This role is both strategic and hands-on.

You will mentor other engineers, collaborate with cross-functional teams, and influence platform-wide improvements.

Your work will directly enhance system resilience, user experience, and operational excellence.</span></span></span></p> <h4 style="margin-top:21px; margin-bottom:21px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><span style="color:#0f4761"><span style="font-weight:normal"><span style="font-style:italic"><b>Key Responsibilities:</b></span></span></span></span></span></span></h4> <ul> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Architect and implement <b>telemetry and observability frameworks</b> across ServiceNow and its ecosystem.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Define and monitor <b>golden signals</b> to drive proactive SRE practices.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Lead <b>incident and problem management reviews</b>, ensuring data-driven root cause analysis and continuous improvement.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Collaborate with development, support, and infrastructure teams to implement <b>self-healing</b>, <b>auto-remediation</b>, and <b>resiliency patterns</b>.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Develop and mature dashboards and real-time alerts using tools like ServiceNow Platform along with <b>Datadog, Splunk, or Grafana</b>.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Drive automation for reliability checks, capacity planning, and environment health.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Establish and promote <b>SRE best practices</b>, playbooks, and operational readiness standards across product teams.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Represent SRE in architectural reviews and platform governance meetings.</span></span></span></li> <li style="margin-top:16px; margin-bottom:16px; margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Mentor junior engineers, foster a learning culture, and ensure adoption of reliability-first principles.</span></span></span></li>
</ul> <p style="margin-bottom:11px"> </p> <h3 style="margin-top:11px; margin-bottom:5px"><span style="font-size:14pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><span style="color:#0f4761"><span style="font-weight:normal"><b><span lang="EN-US" style="font-size:12.0pt"><span style="line-height:116%">Qualifications:</span></span></b></span></span></span></span></span></h3> <ul> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Bachelor's or Master's degree in Computer Science, Engineering, or related technical field.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><b>10+ years of IT experience</b>, with <b>5+ years in SRE or production engineering</b>, and <b>2+ years in a lead or principal role</b>.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Proven experience in managing <b>observability, telemetry, and incident response</b> frameworks at scale.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Deep understanding of <b>ITIL-aligned processes</b> (Incident, Problem, Change).</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Strong <b>leadership and collaboration</b> skills, with the ability to influence across engineering and business teams.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Excellent verbal and written communication, especially in articulating technical decisions to business stakeholders.</span></span></span></li>
</ul> <p style="margin-bottom:11px"> </p> <h3 style="margin-top:11px; margin-bottom:5px"><span style="font-size:14pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><span style="color:#0f4761"><span style="font-weight:normal"><b><span lang="EN-US" style="font-size:12.0pt"><span style="line-height:116%">Technical Requirements:</span></span></b></span></span></span></span></span></h3> <ul> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Strong experience with <b>monitoring tools</b> such as <b>Datadog, Splunk, Prometheus, Grafana</b>, or equivalents.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Proficient in <b>ServiceNow platform administration</b>, performance tuning, and API integrations.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Solid command over <b>Unix/Linux internals</b>, system performance tuning, and network troubleshooting.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Proficient in one or more scripting languages: <b>Python, Shell, JavaScript</b>.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Hands-on experience with <b>Kubernetes</b>, <b>containers</b>, and <b>CI/CD pipelines</b>.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Deep understanding of <b>HTTP/S, DNS, SSL/TLS</b>, and other web protocols.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Familiarity with <b>cloud platforms</b> (AWS, Azure, or GCP); <b>certifications preferred</b>.</span></span></span></li>
</ul> <p style="margin-bottom:11px"> </p> <h3 style="margin-top:11px; margin-bottom:5px"><span style="font-size:14pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif"><span style="color:#0f4761"><span style="font-weight:normal"><b><span lang="EN-US" style="font-size:12.0pt"><span style="line-height:116%">Preferred (Nice to Have):</span></span></b></span></span></span></span></span></h3> <ul> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Experience with <b>ServiceNow ITOM modules</b> like <b>Event Management, AIOps, and Discovery</b>.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Knowledge of <b>AI/ML-based anomaly detection</b> and alerting strategies.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Experience with <b>infrastructure-as-code</b> using tools like <b>Ansible, Terraform</b>.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Familiarity with <b>performance profiling and diagnostics</b> of complex applications.</span></span></span></li> <li style="margin-left:8px"><span style="font-size:12pt"><span style="line-height:116%"><span style="font-family:Aptos,sans-serif">Previous success in establishing <b>SRE teams or practices</b> from the ground up.</span></span></span></li>
</ul>


Required Skill Profession

Computer Occupations



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Lead Site Potential: Insight & Career Growth Guide