Teamware Solutions is seeking a skilled Monitoring and Observability Engineer with expertise in Java application performance, AppDynamics, SiteScope, Splunk, Grafana, and the ELK Stack.
You'll play a vital role in ensuring the health, performance, and availability of our critical applications and infrastructure through comprehensive monitoring, proactive alerting, and in-depth analysis.
Key Responsibilities
- Implement, configure, and manage monitoring solutions using AppDynamics for Application Performance Monitoring (APM) of Java applications.
- Deploy and maintain infrastructure monitoring with SiteScope, collecting metrics from various servers and systems.
- Design and build centralized logging and analytics platforms using the ELK Stack (Elasticsearch, Logstash, Kibana).
- Develop powerful dashboards and alerts in Grafana to visualize key performance indicators and system metrics.
- Utilize Splunk for comprehensive log management, security event monitoring, and operational intelligence.
- Analyze performance data, identify bottlenecks, and provide insights to development and operations teams for optimization.
- Configure proactive alerting and notification systems to ensure rapid response to potential issues.
- Collaborate with development, DevOps, and operations teams to integrate monitoring tools into the CI/CD pipeline and improve observability practices.
Qualifications
- Proven experience as a Monitoring Engineer, Observability Specialist, or SRE with a focus on application and infrastructure monitoring.
Skills Required:
- Strong hands-on experience with AppDynamics for Java application monitoring, including custom dashboards and alerting.
- Proficiency in managing and configuring SiteScope for infrastructure and system monitoring.
- Extensive experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for log aggregation, analysis, and visualization.
- Solid experience in developing dashboards, queries, and alerts using Grafana.
- Expertise in Splunk for log management, searching, and creating reports/dashboards.
- Strong understanding of Java application architecture and common performance metrics.
- Familiarity with Unix/Linux operating systems for server monitoring.
- Excellent analytical, problem-solving, and troubleshooting skills in complex IT environments.
Preferred Skills:
- Experience with other APM tools (e.g., Dynatrace, New Relic).
- Knowledge of cloud monitoring services (e.g., AWS CloudWatch, Azure Monitor).
- Proficiency in scripting (e.g., Python, Shell) for automation of monitoring tasks.
- Familiarity with ITIL processes and SRE principles.
Skills Required
Java Application, Appdynamics, Sitescope, Splunk, Grafana, Elk Stack