Job description
 
                         SRE Observability Platform Architect - 127074 Description 
 
  Observability Platform Architect  
     Experience:  
    · Minimum 10 years of relevant work experience with monitoring setup using any product (Dynatrace, Datadog, ELK stack, Splunk, Grafana/Prometheus, etc.) set up in critical production environments.
    · Minimum 5-6 years of work experience in end-to-end observability covering technical, user experience and business outcome metrics.
Experience with AIOps is an advantage.
    · Has experience working with private cloud and Cloud-native public-cloud (particularly AWS) hosted applications.
    · Multi-tenancy setup and data segregation on the observability and AIOps stack.
    · Designing and building an Observability & Maintenance (O&M) module for multi-tenant solutions.
    · Defining SLIs and setting up SLOs for multi-tenant solutions.
    Core Capabilities:  
    · Experience in implementing Container, Network, APM, RUM, Log Analytics, end-to-end tracing, and custom alerts with Grafana, Prometheus, Grafana Loki (alternatively Logstash or Fluent bit).
Implementing the same on any other 3rd party product like Dynatrace is also considered.
    · Proficiency with containers and multi-tenancy setup for the observability solution is critical.
    · Ability to configure custom alerts, monitors and build AIOps workflows based on telemetry.
    · Good understanding of setting up integration capabilities with other systems via APIs and consuming external APIs for IAM as well as ingesting metric-based telemetry via collectors.
    · Ability to build custom observability dashboards across different portfolios and personas.
    · Setting up Synthetic Monitoring and Test Automation while integrating its telemetry into the observability stack.
    · Tenant and data segregation as well as ability to obfuscate sensitive information on the common observability schema.
    · Ability to code is preferable – Python / Java and Ansible scripting preferred.
  Primary Location Hyderabad, Andhra Pradesh, India Job Type Experienced Years of Experience 13 Qualification 
  Qualification  :  
    · Observability Foundation certification from DevOps Institute or any product-level accreditation.
    · Any recognized System Architecture qualifications ( TOGAF) are a bonus.
    Role & Responsibilities:  
    · Architect, design and ensure Implementation of the entire observability solution to be packaged as a module in a multi-tenant private cloud solution.
    · Implement observability solution to monitor and apply the same feature-set across all tenants (monitor and act upon telemetry from tenants – serving as a hypervisor).
    · Design and implement integrations as well as externalize APIs. 
    · Set up authentication and authorization controls by integrating with an IAM layer.
    · Work with UI/UX teams to design dashboards for the Observability & Maintenance platform for both the tenants as well as the host.
    · Design and set up an AIOps module responsible for automated remediation workflows such as capacity scaling, container restarts, anomaly detection, etc.
    · Work on building Proof-of-Concept solutions to view end-to-end tube-maps / service flows for the respective tenant’s services.
    · Defining and setting up a CMDB to serve as a source for the infrastructure and application telemetry.
    · Work with other teams to ensure the system is well-tested and scalable, meeting tenant demands.
    · Define business aligned SLIs and set SLOs for core services and journeys.
  Travel No
 
                    
                    
Required Skill Profession
 
                     
                    
                    Computer Occupations