Job Description
<p><p><b>About the Role :</b><br/><br/>We are seeking a highly skilled Datadog Implementation Engineer to join our team and lead the design, implementation, and maintenance of Datadog monitoring and observability solutions.<br/><br/> The ideal candidate will have extensive hands-on experience with the Datadog platform, including APM, infrastructure monitoring, and cloud observability, enabling us to ensure application performance, reliability, and security across diverse environments.<br/><br/><b>Key Responsibilities :</b><br/><br/>- Design, implement, configure, and maintain Datadog monitoring solutions across infrastructure, applications, cloud services, and security domains.<br/><br/>- Build and optimize application performance monitoring (APM) using Datadog modules such as Spans and Traces to detect and diagnose issues proactively.<br/><br/>- Develop comprehensive dashboards and alerts tailored to business and technical requirements to provide actionable insights.<br/><br/>- Manage and optimize Datadog billing and resource usage for cost-effective monitoring.<br/><br/>- Integrate Datadog with incident management and collaboration tools such as PagerDuty, ServiceNow, Slack, and Jira to streamline alerting and resolution workflows.<br/><br/>- Collaborate with DevOps, SRE, and engineering teams to implement Datadog agents and custom integrations for cloud platforms including AWS, Azure, and Google Cloud Platform (GCP).<br/><br/>- Tune Linux systems, network configurations, and application performance to enhance monitoring accuracy and response times.<br/><br/>- Extend Datadog functionality through custom plugins, scripts, and configurations as required.<br/><br/>- Analyze system and application logs to detect anomalies and ensure system health and security monitoring.<br/><br/>- Provide expert-level guidance on application platforms, architecture, and monitoring best practices, covering networking, databases, runtime environments, and user interfaces.<br/><br/>- Develop and maintain technical documentation related to Datadog implementations and monitoring standards.<br/><br/>- Communicate effectively with stakeholders, troubleshoot complex issues, and provide resolution recommendations.<br/><br/>- Stay current with the latest Datadog features, cloud technologies, and monitoring industry trends.<br/><br/>- Automate monitoring deployment and configuration tasks using Ansible or similar configuration management tools.<br/><br/>- Leverage scripting skills in Python or Node.js to enhance monitoring workflows and automation.<br/><br/><b>Required Skills and Qualifications</b><br/><br/>- 4+ years of experience designing, implementing, and managing Datadog monitoring solutions.<br/><br/>- Strong hands-on experience with Datadog modules: Infrastructure Monitoring, APM, RUM, Logs, Synthetics, Cloud Monitoring, Database, Network, and Security Monitoring.<br/><br/>- Deep understanding of distributed tracing concepts including spans and traces.<br/><br/>- Expertise in creating interactive, insightful dashboards and configuring alerting systems.<br/><br/>- Experience integrating Datadog with ITSM and incident management tools such as PagerDuty, ServiceNow, Slack, and Jira.<br/><br/>- Proficient with cloud platforms AWS, Azure, and GCP, including deployment and monitoring strategies.<br/><br/>- Strong knowledge of Linux operating systems, networking, and system performance tuning.<br/><br/>- Familiarity with scripting languages like Python and Node.js to create custom monitoring solutions and automation.<br/><br/>- Working knowledge of Ansible or similar automation/configuration management tools.<br/><br/>- Solid understanding of application architecture, including databases, middleware, front-end/back-end layers, and networking.<br/><br/>- Excellent communication, teamwork, and problem-solving skills.<br/><br/>- Ability to work independently and collaboratively in a fast-paced, agile environment</p><br/></p> (ref:hirist.tech)