Job Description
            
                Location:  Remote (India-based candidates preferred)
Employment:  Full-time (Hired through  VS Technology , working on  ServeFirst  product)
ServeFirst is a  scrappy, well-funded AI startup  (recently closed a  £4.5m seed round ) that’s redefining how businesses understand their customers.
We transform  raw customer feedback into real-time insights  for companies that care deeply about CX.
Our  2025 roadmap is ambitious :
Break apart our Node.js monolith into  scalable microservices
Double our  AI-driven workflows
Harden our infra to handle  100× traffic growth
Maintain a culture where  everyone ships, everyone is on-call, bureaucracy is nil, and velocity is high
This is a chance to  join early, own core architecture decisions, and scale AI-first systems  in production
What You’ll Do   Break up the monolith
Define service boundaries, transition to microservices, and implement  REST + SQS communication  containerised on  ECS Fargate .
Own AI workflow integration
Orchestrate AI workflows with  OpenAI/Claude APIs  and frameworks like  LangChain .
Design  multi-model pipelines  (prompt orchestration, vector DBs, caching, telemetry) and lead our shift toward  Bedrock/RAG-native infra .
Build and scale AI infra
Stand up inference/workflow infra on  AWS (Bedrock, SageMaker, containers) .
Ensure observability, security, and cost efficiency.
Architect the AI platform
Go beyond “just wrapping GPT.” Define orchestration vs.
inference boundaries, expose tracing & prompt history, and build systems for experimentation & scale.
Champion testing and correctness
Enforce  unit, integration, and load testing .
Design clear contracts, mocks, and fast CI.
Estimate, scope, deliver
Break complex features into milestones, highlight risks, and communicate trade-offs clearly.
Make it observable
Implement  structured logs, metrics, traces, alerts .
Make  LLM behavior traceable  (token usage, prompt mutation).
Think security first
Handle  sensitive customer data  (PII, IAM, secrets management, GDPR readiness).
Ship backend code
Work in  Node.js/TypeScript , with  MongoDB (Mongoose) ,  Redis , and schedulers ( cron/EventBridge ).
Keep infra flexible
AWS-first today, modular  Terraform  for GCP portability tomorrow.
Mentor & raise the bar
Lead reviews, mentor engineers, and reinforce best practices while maintaining velocity.
✅ Must-Haves  8+ years backend engineering;  deep AWS + Node.js/TypeScript  experience
Strong  system design  skills (event-driven, autoscaling systems)
Production LLM workflow experience  (OpenAI, Claude, Bedrock, etc.)
Hands-on with  LangChain / agentic frameworks / workflow automation
Proven track in  deploying/scaling AI infra  (Bedrock, SageMaker, containers)
MongoDB & Redis  tuning/debugging expertise
Fluent with  Terraform & Docker
Testing mindset with  CI/CD pipelines
Strong  async communication  (docs, writing, code clarity)
Security awareness (IAM, GDPR, SOC2 basics)
Demonstrated ability to  scope, build, and deliver complex features
Nice-to-Haves  CX / survey / analytics SaaS background
Experience with  Bedrock, LangChain, or RAG-native infra
LLMOps exposure (prompt versioning, feedback loops, telemetry)
GCP infra knowledge
React familiarity / empathy for frontend engineers
Incident response & blameless postmortems