Job Description
What You ll Do
- Collaborate with engineering teams to provide feedback and contribute code where needed, enhancing product functionality and resilience.
- Participate in on-call rotations to ensure 24x7 availability of services.
- Design and develop tools to support 24x7 follow-the-sun operations for critical production systems.
- Automate deployment tasks for core products and infrastructure, maintaining a robust automation framework.
- Monitor and optimize the performance of applications on the Guidewire Cloud Platform, ensuring reliability and efficiency.
- Develop and maintain observability tools, metrics, and dashboards, including self-healing mechanisms for increased reliability.
- Foster a culture of reliability by promoting blameless postmortems, SLO tracking, and continuous learning from incidents.
- Proactively identify and address infrastructure issues to minimize business impact.
- Develop system documentation and training materials to empower and educate team members.
Who You Are
- Skilled in programming with Python or Go for building internal tools, CLIs, and APIs; familiarity with Java and Spring Boot is a plus.
- Exceptional troubleshooting skills, with a proactive, critical approach to solving complex issues.
- Proficient in containerization technologies, with hands-on expertise in Docker, Helm, Kubernetes (EKS), CNI, and Ingress networking.
- Strong knowledge of Kubernetes concepts (pods, deployments, services, statefulsets, ingress etc.) and the Operator pattern.
- Experienced with Terraform, including developing and testing complex modules.
- Advanced experience with AWS, including custom tool development using AWS SDK.
- Solid understanding of Single Sign-On (SSO), SAML, and OAuth protocols; experience with Okta is a bonus.
- Skilled in using observability tools such as Prometheus, OpenTelemetry, or Datadog for proactive monitoring.
- Production-At-Scale support background in a heavily microservice-based world.
- Familiar with agile methodologies, including Scrum and Kanban, to enhance software development processes.
- Excellent communication skills, with the ability to explain complex technical concepts to diverse audiences.
Other Requirements
- Bachelor s Degree in Computer Science or a related field.
- Ability to read, write, and speak English
- We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support
- Travel - Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings
Bonus Points
- Kubernetes or AWS certifications
- Contributions to open source projects
- Familiar with Kubevela (OAM) or Crossplane for Kubernetes-native infrastructure management
- Experience in managing large scale Aurora PostgreSQL clusters and Aurora Serverless
- Experience with TeamCity CI or GitHub actions
Skills Required
Github, Postgresql, Aws, Saml, Python