Job Description
<p>Architecting Test Systems :<br/><br/>- Architect test frameworks and infrastructure for validating microservices and infrastructure components in multi-cluster and hybrid-cloud environments.<br/><br/>- Oversee the design of complex test scenarios simulating production-like workloads, resource scaling, failure injection, and recovery across distributed clusters.<br/><br/>Automation & Scalability :<br/><br/>- Spearhead the development of scalable and maintainable test automation integrated with CI/CD (Jenkins, GitHub Actions, etc.).<br/><br/>- Leverage Kubernetes APIs, Helm, and service mesh tools to build comprehensive automation coverage, including system health, failover behavior, and network resilience.<br/><br/>- Promote test infrastructure-as-code and drive IaC forward on the team making sure the infrastructure code is repeatable, extensible and reliable.
<br/><br/>Technical Proficiency :<br/><br/>- Deep understanding of Kubernetes internals, cluster lifecycle management, Helm, service meshes (e.g., Istio or Linkerd), and network policies.<br/><br/>- Strong scripting and automation capabilities (Python, Pytest, Bash, etc.).<br/><br/>- Familiarity with observability stacks (Prometheus, Grafana, Jaeger), Kubernetes security (RBAC, secrets management), and performance benchmarking tools (e.g., K6).<br/><br/>- Solid grounding in cloud architecture (AWS, Azure, GCP), infrastructure provisioning, and containerized CI/CD.<br/><br/>- Moderate to advanced linux knowledge and proficiency is required : Bash scripting and debugging, systemd/logs, networking/firewalling/routing, certificate/PKI management, containers (Docker/containerd), and Kubernetes tooling (kubectl/Helm with OCI registries, GitOps/Flux) to install, test, and troubleshoot multi-cluster & Scalability :<br/><br/>- Spearhead the design and development of highly scalable, maintainable test automation systems, seamlessly integrated into CI/CD pipelines (Jenkins, GitHub Actions, GitLab, ArgoCD).<br/><br/>- Leverage Kubernetes APIs, Helm charts, and service mesh frameworks (Istio, Linkerd) to enable full automation coverage for system health monitoring, network resilience testing, failover validation, and scaling scenarios.<br/><br/>- Advocate and implement Test Infrastructure-as-Code (IaC), ensuring all test systems are repeatable, auditable, extensible, and reliable.
Drive the adoption of GitOps practices for test environments.<br/><br/>Technical Proficiency :<br/><br/>- Kubernetes & Cloud Expertise : Deep understanding of Kubernetes internals, cluster lifecycle management, networking policies, Helm, and service meshes, combined with hands-on experience across major cloud platforms (AWS, Azure, GCP).<br/><br/>- Observability & Monitoring : Skilled in designing observability pipelines using Prometheus, Grafana, Jaeger, with strong focus on proactive monitoring, tracing, and alerting for distributed systems.<br/><br/>- Scripting & Automation : Proficient in Python (Pytest, automation frameworks), Bash scripting, and DevOps toolchains, ensuring streamlined automation and test orchestration.<br/><br/>- Security & Reliability : Familiar with RBAC, PKI management, secrets handling, and network security policies for production-grade Kubernetes clusters.<br/><br/>- System Benchmarking : Experienced in load and performance benchmarking using K6, Locust, and custom benchmarking harnesses to validate system throughput, latency, and resilience under stress.<br/><br/>- Linux & System Proficiency : Solid grounding in Linux administration, including systemd management, firewalling/routing, container runtimes (Docker, containerd), networking diagnostics, and log debugging.</p> (ref:hirist.tech)