Job description
Location: Remote (India-based candidates preferred)
Employment: Full-time (Hired through VS Technology, working on ServeFirst product)
ServeFirst is a scrappy, well-funded AI startup (recently closed a £4.5m seed round) that’s redefining how businesses understand their customers.
We transform raw customer feedback into real-time insights for companies that care deeply about CX.
Our 2025 roadmap is ambitious:
- Break apart our Node.js monolith into scalable microservices
- Double our AI-driven workflows
- Harden our infra to handle 100× traffic growth
- Maintain a culture where everyone ships, everyone is on-call, bureaucracy is nil, and velocity is high
This is a chance to join early, own core architecture decisions, and scale AI-first systems in production
What You’ll Do
- Break up the monolith
- Define service boundaries, transition to microservices, and implement REST + SQS communication containerised on ECS Fargate.
- Own AI workflow integration
- Orchestrate AI workflows with OpenAI/Claude APIs and frameworks like LangChain.
Design multi-model pipelines (prompt orchestration, vector DBs, caching, telemetry) and lead our shift toward Bedrock/RAG-native infra.
- Build and scale AI infra
- Stand up inference/workflow infra on AWS (Bedrock, SageMaker, containers).
Ensure observability, security, and cost efficiency.
- Architect the AI platform
- Go beyond “just wrapping GPT.” Define orchestration vs.
inference boundaries, expose tracing & prompt history, and build systems for experimentation & scale.
- Champion testing and correctness
- Enforce unit, integration, and load testing.
Design clear contracts, mocks, and fast CI.
- Estimate, scope, deliver
- Break complex features into milestones, highlight risks, and communicate trade-offs clearly.
- Make it observable
- Implement structured logs, metrics, traces, alerts.
Make LLM behavior traceable (token usage, prompt mutation).
- Think security first
- Handle sensitive customer data (PII, IAM, secrets management, GDPR readiness).
- Ship backend code
- Work in Node.js/TypeScript, with MongoDB (Mongoose), Redis, and schedulers (cron/EventBridge).
- Keep infra flexible
- AWS-first today, modular Terraform for GCP portability tomorrow.
- Mentor & raise the bar
- Lead reviews, mentor engineers, and reinforce best practices while maintaining velocity.
Must-Haves
- 8+ years backend engineering; deep AWS + Node.js/TypeScript experience
- Strong system design skills (event-driven, autoscaling systems)
- Production LLM workflow experience (OpenAI, Claude, Bedrock, etc.)
- Hands-on with LangChain / agentic frameworks / workflow automation
- Proven track in deploying/scaling AI infra (Bedrock, SageMaker, containers)
- MongoDB & Redis tuning/debugging expertise
- Fluent with Terraform & Docker
- Testing mindset with CI/CD pipelines
- Strong async communication (docs, writing, code clarity)
- Security awareness (IAM, GDPR, SOC2 basics)
- Demonstrated ability to scope, build, and deliver complex features
Nice-to-Haves
- CX / survey / analytics SaaS background
- Experience with Bedrock, LangChain, or RAG-native infra
- LLMOps exposure (prompt versioning, feedback loops, telemetry)
- GCP infra knowledge
- React familiarity / empathy for frontend engineers
- Incident response & blameless postmortems
Required Skill Profession
Computer Occupations