Job Description
<p><p><b>Description :</b><br/><br/>We are looking for an experienced Cloud Platform Architect with deep expertise in networking, storage, and Kubernetes to design and implement a cloud platform at scale, similar to AWS/GCP/Azure.
The ideal candidate will have strong experience in infrastructure automation, distributed systems, and large-scale platform engineering, with the ability to architect and lead the development of multi-tenant, high-performance cloud services.<br/><br/><b>Key Responsibilities :</b><br/><br/><b>Cloud Platform Architecture :</b><br/><br/>- Design and implement a scalable cloud platform covering compute, storage, and networking layers.<br/><br/>- Define architecture for multi-cluster Kubernetes environments, ensuring high availability, scalability, and security.<br/><br/>- Build core services such as identity & access management, service discovery, observability, and API gateways.<br/><br/><b>Networking :</b><br/><br/>- Architect multi-tenant networking for VPC/VNet equivalents, load balancers, firewalls, and service meshes.<br/><br/>- Implement SDN solutions (Calico, Cilium, OVN, etc.) and network policy enforcement at scale.<br/><br/>- Optimize inter-cluster and inter-datacenter connectivity.<br/><br/><b>Storage :</b><br/><br/>- Design and manage distributed storage solutions (Ceph, Rook, OpenEBS, MinIO, Lustre).<br/><br/>- Architect persistent storage for Kubernetes (CSI drivers, snapshots, backup/restore).<br/><br/>- Ensure data availability, durability, and compliance with SLAs.<br/><br/><b>Kubernetes & Orchestration :</b><br/><br/>- Design multi-tenant Kubernetes platforms with advanced scheduling, security, and RBAC.<br/><br/>- Automate provisioning, scaling, and upgrades using operators, Helm, and GitOps (ArgoCD/Flux).<br/><br/>- Integrate with monitoring/logging (Prometheus, Grafana, Loki, ELK).<br/><br/><b>Automation & Infrastructure-as-Code :</b><br/><br/>- Implement full stack automation with Terraform, Ansible, or Pulumi.<br/><br/>- Drive CI/CD pipelines for infrastructure and application delivery.<br/><br/>- Build self-service capabilities for internal teams.<br/><br/><b>Security & Compliance :</b><br/><br/>- Design security at all layers (network, storage, workloads).<br/><br/>- Implement secrets management (Vault, External Secrets, KMS).<br/><br/>- Ensure compliance with data governance and regulatory requirements.<br/><br/><b>Leadership :</b><br/><br/>- Collaborate with product and engineering teams to define roadmap and priorities.<br/><br/>- Mentor and guide platform engineers and DevOps teams.<br/><br/>- Evaluate new technologies and contribute to open-source where applicable.<br/><br/><b>Required Skills & Experience :</b><br/><br/>- Networking : Deep knowledge of TCP/IP, routing, load balancing, DNS, SDN (Calico, Cilium, Istio/Linkerd).<br/><br/>- Storage : Hands-on with distributed storage (Ceph, MinIO, Gluster, Rook) and Kubernetes storage orchestration (CSI).<br/><br/>- Kubernetes : 5+ years experience, expert in multi-cluster deployments, operators, controllers, service mesh.<br/><br/>- Cloud & Infra : Strong background in virtualization (KVM, VMware, OpenStack) and bare-metal automation (MAAS, Ironic, PXE, IPMI/Redfish).<br/><br/>- IaC & Automation : Proficiency in Terraform, Ansible, GitOps tools (ArgoCD, Flux).<br/><br/>- CI/CD : Experience with Jenkins, GitHub Actions, GitLab CI/CD.<br/><br/>- Programming/Scripting : Proficiency in Go, Python, or Bash.<br/><br/>- Monitoring/Observability : Prometheus, Grafana, Loki, ELK, Jaeger.<br/><br/>- Strong knowledge of distributed systems, high availability, and fault tolerance.<br/><br/><b>Preferred Qualifications :</b><br/><br/>- Experience designing cloud platforms at scale (e.g., internal private cloud, hyperscaler background).<br/><br/>- Contributions to open-source Kubernetes ecosystem (CNCF projects).<br/><br/>- Familiarity with service billing, quota management, and multi-tenancy at scale.<br/><br/>- Exposure to bare-metal cloud orchestration (Metal3, Tinkerbell, Equinix Metal, Ironic).<br/><br/>- Strong leadership and architectural decision-making skills.</p><br/></p> (ref:hirist.tech)