MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data.
We enable organizations of all sizes to easily build, scale, and run modern applications by helping them modernize legacy workloads, embrace innovation, and unleash AI.
Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available in more than 115 regions across AWS, Google Cloud, and Microsoft Azure.
Atlas allows customers to build and run applications anywhere—on premises, or across cloud providers.
With offices worldwide and over 175,000 new developers signing up to use MongoDB every month, it’s no wonder that leading organizations, like Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications.
Cloud Operations Engineers are responsible for building internal tools and process automation.
Day-to-day duties are creating and monitoring systems alert dashboards, reviewing critical event and system logs, accessing customer instances that underpin their production databases, and performing server administration duties including performance troubleshooting.
Applicants must be critical thinkers who are quick to detect, resolve, or escalate issues that are sometimes broad in scope and difficult to trace.
We are looking to speak to candidates who are based in Bengaluru for our hybrid working model.
Responsibilities
 Help scale the Cloud Operations Engineering team with the strategic implementation and refinement of processes and toolsProvide career development feedback and advice to direct reportsIdentify and measure team health indicators and performance metricsEnsure proper team focus on priorities, objectives, and related deliverablesCollaborate with technical and non-technical teams across the companyBalance your time between leading your team, working on customer incidents and being involved in projectsBe a source of guidance and advice to your own team members and other teams within MongoDBBuild a relationship with your team around trustSuccessfully coordinate with a global team of Cloud Operations Engineers who are tasked with ensuring our uptime guarantees to the MongoDB Atlas customer baseParticipate in designing and building internal toolsAssist in scoping, designing and deploying systems that reduce Mean Time to Resolve for customer incidentsMonitor and detect emerging customer-facing incidents on the Atlas platform; assist in their proactive resolutionAutomate internal processes, routine monitoring and troubleshooting tasksDiagnose live incidents, differentiate between platform issues versus usage issues, and take the next steps toward resolutionCooperate with our Product Management and Cloud Engineering organizations by identifying areas for improvement in the management applications powering the Atlas infrastructureCoordinate and participate in a weekly on-call rotation, where you will handle short term customer incidents (from direct surveillance or through alerts via our Technical Services Engineers) Requirements
 Management skills, with hands-on experience running small to mid sized Engineering Teams in a rapid-growth environment Strong diagnostic/troubleshooting process, with significant experience troubleshooting end-to-end technical issues in production environmentsExperience supervising, leading and monitoring progress of Software Development projects.Patience, empathy, and a genuine desire to help othersExcellent communication skills, both written and verbalAbility to think on your feet, remain calm under pressure, and find solutions to challenges in real-timeExperience with being an oncall DevOps, SRE, or Cloud Operations engineerExpertise with Linux system administration and networking technologiesKnowledge of database and distributed system operations and conceptsKnowledgeable about a wide range of web and internet technologiesFamiliarity with Amazon Web Services and other Cloud infrastructure platforms (e.g. GCP, Azure)Experience in monitoring, system performance data collection and analysis, and reportingCapability to write programs/scripts to solve both short-term systems problems and long term strategic objectives for the Atlas productA CS/CE degree or equivalent experienceAt least 2 of the following programming languages: Java, Go, Python, TypescriptA keen interest in learning new skills and competencies