Job description
 
                         Role Overview
We are looking for an experienced  MLOps Lead  with deep expertise in  Azure and AWS cloud ecosystems , who can design, deploy, and manage scalable AI/ML infrastructure.
The ideal candidate should bring a strong background in  cloud governance, GenAI tooling, automation, and CI/CD pipelines , with hands-on experience across modern MLOps frameworks.
Key Responsibilities
Design, implement, and manage scalable cloud-based AI/ML infrastructure across  Azure and AWS .
Drive  end-to-end MLOps lifecycle  — model deployment, monitoring, retraining, and governance.
Enable  GenAI and Agentic AI platforms  leveraging Azure OpenAI, Bedrock, Anthropic Claude, LangChain, etc.
Implement  CI/CD pipelines  using Azure DevOps or AWS CodePipeline.
Ensure  security, observability, and compliance  across ML and GenAI ecosystems.
Manage infrastructure automation via  Terraform, Bicep, CloudFormation , or similar IaC tools.
Collaborate with data science and engineering teams to optimize ML workflows, data pipelines, and API integrations.
Implement  monitoring and alerting  using Grafana, Prometheus, Azure Monitor, and Application Insights.
Oversee  networking, identity management, and role-based access controls (IAM, RBAC)  across clouds.
Support model lifecycle management —  drift monitoring, retraining, technical evaluation, and business validation.
Technical Skills & Expertise
Cloud & MLOps Platforms
Azure:  Azure ML, Azure AI Services, Azure OpenAI, Azure Kubernetes Service (AKS), Databricks, Azure Search, Azure Blob, Cosmos DB, Azure SQL, Azure Functions, Azure Event Hub, Azure Resource Manager (ARM), Bicep.
AWS:  SageMaker, Bedrock, Lambda, DynamoDB, S3, RDS, Redshift, ECR, CloudFormation, CDK, KMS, EventBridge, Step Functions.
AI/ML & Programming
Hands-on in  Python , with exposure to TensorFlow, PyTorch, scikit-learn.
Understanding of  LLM tokenization, prompt injection risks, jailbreak prevention, and AI safety techniques.
Familiarity with  LangChain, LlamaCloud, AI Foundry , and related frameworks.
Experience in  model monitoring, retraining, and evaluation workflows.
DevOps & Infrastructure
Expertise in  CI/CD pipelines ,  containerization (Docker, Kubernetes) , and  infrastructure automation .
Strong in  governance, audit logging, security policies  (Azure Policy, AWS SCP, IAM).
Deep understanding of  networking, DNS, load balancers, VNets/VPCs, VPNs.
Skilled in  IaC  tools – Terraform, Bicep, ARM, CloudFormation.
Monitoring & Observability
Experience with  Grafana, Prometheus, Application Insights, Log Analytics Workspaces, Azure Monitor.
Security & Access Management
Understanding of  Microsoft AD, least privilege principles, IAM, RBAC.
Testing & Automation
Familiarity with  unit testing and integration testing  in CI/CD workflows (preferably Azure DevOps).
Good to Have
Experience with  Azure Bot Framework ,  M365 Copilot , and  APIM .
Exposure to  code assistants  such as GitHub Copilot, Cursor, Claude Code.
Knowledge of  Boto3 SDK (AWS Python)  and  TypeScript for IaC .
Preferred Background
Strong background in  cloud infrastructure engineering  and  machine learning operations .
Proven ability to lead  cross-functional teams  and implement  AI governance  at scale.
Excellent problem-solving, communication, and documentation skills.
 
                    
                    
Required Skill Profession
 
                     
                    
                    Computer Occupations