Role Overview :
We’re looking for an AI Engineer for one of our Tier-1 IT clients with hands-on experience in building, fine-tuning, and optimizing LLM-based applications.
The ideal candidate will have solid expertise in RAG (Retrieval-Augmented Generation) architectures, parameter-efficient fine-tuning (e.g., LoRA), and model quantization techniques for deployment efficiency.
Key Responsibilities:
● Design, implement, and optimize end-to-end LLM-based solutions for real-world applications.
● Develop and maintain RAG pipelines integrating vector databases, embeddings, and retrieval techniques.
● Fine-tune pre-trained language models using LoRA or similar methods.
● Apply quantization and optimization strategies to deploy models efficiently on constrained environments.
● Collaborate with data scientists, software engineers, and product teams to integrate AI features into production systems.
● Monitor, evaluate, and continuously improve model performance and reliability.
Required Skills:
● 3–5 years of experience in AI/ML development or applied NLP.
● Proficient in Python and frameworks such as PyTorch or TensorFlow.
● Strong understanding of LLM architectures (e.g., GPT, Llama, Falcon, Mistral).
● Experience with RAG frameworks (LangChain, LlamaIndex, or custom retrieval setups).
● Hands-on knowledge of LoRA, PEFT, and model quantization (GPTQ, AWQ, or similar).
● Familiarity with vector databases like FAISS, Pinecone, or ChromaDB.
● Good understanding of prompt engineering and evaluation techniques.
● Cloud deployment experience (AWS, Azure, or GCP) is an advantage.
Preferred Skills :
● Exposure to open‑source models and fine-tuning pipelines.
● Experience integrating AI models into web or enterprise products.
● Knowledge of containerization and MLOps (Docker, Kubernetes, MLflow).