We are looking for an L3 Support Engineer with deep expertise in Elasticsearch and the ELK stack to provide operational support and drive the stability, performance, and scalability of large-scale cluster deployments.
The ideal candidate will play a vital role in diagnosing complex system issues, optimizing performance, and ensuring the reliability of Elasticsearch clusters in production environments.
Key Roles and Responsibilities:
- Provide advanced (L3) technical support for Elasticsearch clusters and associated components (Kibana, Logstash, Beats, metricbeats, etc.).
- Monitor, troubleshoot, and resolve critical production issues related to cluster performance, indexing latency, node failures, and data ingestion bottlenecks.
- Design and maintain ingestion pipelines using Logstash, Beats, or custom shippers to ensure real-time data processing and reliability.
- Optimize Elasticsearch performance by tuning shards, mappings, queries, analyzers, and index templates.
- Manage and implement Index Lifecycle Management (ILM), data retention policies, and archival strategies to maintain cluster health and storage efficiency.
- Perform root cause analysis and work with cross-functional teams to resolve systemic issues.
- Maintain operational documentation and provide knowledge transfer to internal teams.
- Participate in on-call /onsite support and contribute to issue resolution efforts.
- Adhere to high-quality work standards
- Responsible for maintaining Confidentiality, Integrity and Availability of Vehere’s information assets including business critical information.
Skills and Experience:
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field, or equivalent industry experience.
- 5–10+ years of hands-on experience managing Elasticsearch clusters in production (version 7.X/8.X preferred).
- Expert understanding of Elasticsearch internals, including Lucene, inverted indexes, query planning, and data structures.
- Strong command over query DSL, aggregations, filters, and performance tuning techniques.
- Experience designing scalable, resilient architecture for ELK/OpenSearch deployments across multi-node clusters.
- Proficiency in Logstash pipelines, custom grok patterns, and Beats agent configurations.
- Familiarity with Kibana or equivalent visualization tools for dashboarding and troubleshooting.
- In-depth knowledge of JVM tuning, garbage collection (GC) strategies, and heap memory optimization.
- Strong scripting skills in Python, Shell, or Bash for automation, monitoring, and custom ingestion workflows.
- Exposure to DevOps practices, CI/CD pipelines, containerization (Docker), and orchestration tools (Kubernetes) or similar large scale data sets is a plus.