Role Overview :
We’re looking for an AI Engineer for one of our Tier-1 IT clients with hands-on experience in building, fine-tuning, and optimizing LLM-based applications.
The ideal candidate will have solid expertise in RAG (Retrieval-Augmented Generation) architectures, parameter-efficient fine-tuning (e.g., LoRA), and model quantization techniques for deployment efficiency.
Key Responsibilities:
 ●     Design, implement, and optimize end-to-end LLM-based solutions for real-world applications.
●     Develop and maintain RAG pipelines integrating vector databases, embeddings, and retrieval techniques.
●     Fine-tune pre-trained language models using LoRA or similar methods.
●     Apply quantization and optimization strategies to deploy models efficiently on constrained environments.
●     Collaborate with data scientists, software engineers, and product teams to integrate AI features into production systems.
●     Monitor, evaluate, and continuously improve model performance and reliability.
Required Skills:
●     3–5 years of experience in AI/ML development or applied NLP.
●     Proficient in Python and frameworks such as PyTorch or TensorFlow.
●     Strong understanding of LLM architectures (e.g., GPT, Llama, Falcon, Mistral).
●     Experience with RAG frameworks (LangChain, LlamaIndex, or custom retrieval setups).
●     Hands-on knowledge of LoRA, PEFT, and model quantization (GPTQ, AWQ, or similar).
●     Familiarity with vector databases like FAISS, Pinecone, or ChromaDB.
●     Good understanding of prompt engineering and evaluation techniques.
●     Cloud deployment experience (AWS, Azure, or GCP) is an advantage.
Preferred Skills :
●     Exposure to open‑source models and fine-tuning pipelines.
●     Experience integrating AI models into web or enterprise products.
●     Knowledge of containerization and MLOps (Docker, Kubernetes, MLflow).