Key Responsibilities
Engineering Excellence
- Responsible for designing, building, and optimizing the automation processes for provisioning infrastructure and applications – via infrastructure-as-code
- Standardize and streamline build and release pipelines – CICD
- Configure and administer platforms and services
- Support operational activities by ensuring platforms and infra for pipelines are optimal, recoverable, and easily scalable to meet the capacity demands
- Provide incident management oversight – root cause analysis, stakeholder communications, post-mortems, and manage preventative measures and resolutions
- Drive improvements in operational efficiency for all services
- Actively involved in deployments of platform and pipeline artefacts
- Ensure security, high availability, and disaster recovery are always front of mind
- Continuous monitoring of cost and risk – with a view to reducing and mitigating
- Ensure production and non-production environments stay in sync through the alignment of stable standards, code and configurations
- Identify relevant emerging trends and build compelling cases for adoption e.g. tool selection
- Involved in PoCs, prototypes and innovation spikes to seek directional outcomes
Capabilities and Experience
Essential Experience
- Proficient with CICD toolchains (e.g. Azure DevOps, Jenkins, Git, Artefactory etc.)
- Proficient in one or more scripting languages for automation (e.g. Linux Bash, PowerShell, Python)
- Proficient in provisioning platforms via Infrastructure-as-Code (IaC) techniques (e.g. Terraform, YAML, Azure Resource Manager (ARM))
- Working experience configuring, securing and administering platforms in Azure; knowledge in Cloud infrastructure and networking principles (e.g. Azure PaaS, IaaS)
- Demonstrable knowledge of working with distributed data platforms (e.g. Azure ADLS, Data Lakes)
- Experience working with vulnerability management and code-inspection tooling (e.g. Snyk, SonarQube)
- Possess an “automation-first” mindset when building solutions; considerations for self-healing and fault-tolerant methods to minimize manual intervention and downtime
Desirable Experience
- Experience building/maintaining an API-led event-driven architecture, e.g. using Azure Event Grid, Azure Functions
- Cluster image container management (e.g. Azure Container Registry)
- Good understanding of network configuration – DNS, Routing, VPN, Firewalls, Endpoint management
- Experience of including vulnerability management within the deployment pipelines – for package dependencies and containers
- Experience in implementing custom Data Observability, capturing telemetry to understand the health of data and pipelines better