Job Description
<p><p><b>JD : </b><br/><br/><b>What youll do :</b></p><p><br/></p><p>- Build LLM apps : Design APIs, microservices, and UIs that use function calling, tools, and streaming responses.<br/><br/>- RAG pipelines : Ingest/clean data, chunk/embeddings, set retrieval strategies (BM25/hybrid), and tune for relevance & latency.<br/><br/>- Prompt & policy engineering : Craft prompts, guardrails, and safety checks (PII redaction, jailbreak defense).<br/><br/>- Model ops : Integrate managed (Azure OpenAI) and open-source (Llama, Mistral) models; choose/optimize runtimes (vLLM/Triton).<br/><br/>- Evaluation & quality : Establish automatic evals (correctness, toxicity, hallucination, latency, cost/token); build golden test sets and CI gates.<br/><br/>- Observability : Add tracing, metrics, and logs (OpenTelemetry); set error budgets & SLOs.<br/><br/>- Security & compliance : Secrets/RBAC, data residency, audit trails; align to SOC2/GDPR.<br/><br/>- Cost control : Token budgeting, caching, batching, quantization/LoRA where appropriate.<br/><br/>- Collaboration : Partner with Product, SecOps, and FinOps; review PRs and mentor juniors.<br/><br/><b>Minimum qualifications :</b></p><p><br/></p><p>- 4 - 5 years software engineering (Python or TypeScript) shipping production services.<br/><br/>- Hands-on LLM experience (?12 years) : built at least one production feature using OpenAI/Azure OpenAI/Bedrock/Vertex or OSS models.<br/><br/>- RAG with a vector DB (Pinecone, Redis, pgvector, Weaviate, Milvus) and embedding models.<br/><br/>- Solid with APIs (REST/GraphQL), Git, testing, and CI/CD (GitHub Actions/Azure DevOps).<br/><br/>- Cloud fundamentals on Azure/AWS/GCP, containers (Docker, Kubernetes basics).<br/><br/>- Clear written & verbal communication; comfort with docs and design reviews.<br/><br/><b>Nice to have :</b></p><p><br/></p><p>- Agent frameworks (LangChain, LlamaIndex, Semantic Kernel, OpenAI Assistants), tools & MCPs.<br/><br/>- Evals frameworks (Ragas, DeepEval, Promptfoo), AB testing, offline/online metrics.<br/><br/>- Fine-tuning/LoRA, distillation, quantization; DSPy; retrieval re-ranking.<br/><br/>- Event systems (Kafka), queues (SQS), and caching layers.<br/><br/>- Frontend familiarity (React/Next.js) for rapid prototyping.<br/><br/><b>Tech stack (example) :</b></p><p><br/></p><p>- Models : Azure OpenAI (GPT-4.x),<br/><br/>- Orchestration : OpenAI Assistants<br/><br/>- Data/RAG : Azure Cognitive Search<br/><br/>- Pipelines : GitHub Actions, Docker, Kubernetes/AKS, Terraform (AVM)<br/><br/>- Observability : OpenTelemetry, Grafana<br/><br/>- Testing/Evals : PyTest, SonarCloud</p><br/></p> (ref:hirist.tech)