Job Description
<p><p><b>Description : </b><br/><br/>We are looking for a skilled Gen AI Platform Engineer to join our team.
The ideal candidate will have 10 years of experience in managing LLM-based systems, with expertise in infrastructure management, prompt versioning, fine-tuning, and deployment.
</p><p><br/></p><p>This role requires a strong understanding of GenAI workloads, performance tuning, scalability, and governance in cloud environments such as AWS, Azure, and Google Cloud.
The engineer will play a pivotal role in optimizing the performance of AI systems and ensuring their scalability in production while building and deploying AI use cases and solutions using these platforms and tools.<br/><br/><b>Key Responsibilities : </b><br/><br/>- Manage and oversee the infrastructure for LLM-based systems, ensuring seamless operation and scalability.<br/><br/></p><p>- Fine-tune, evaluate, and deploy prompt-based models, leveraging industry-standard tools and platforms.<br/><br/></p><p>- Ensure the performance, scalability, and governance of GenAI workloads in cloud environments (AWS, Azure, Google Cloud).<br/><br/></p><p>- Build and deploy AI use cases and solutions using the respective platforms and tools.<br/><br/></p><p>- Collaborate with cross-functional teams to ensure effective deployment and performance optimization.<br/><br/></p><p>- Lead the evaluation and enhancement of LLM-based models through iterative testing and fine-tuning.<br/><br/></p><p>- Handle deployment pipelines, including CI/CD for LLM models.<br/><br/></p><p>- Contribute to setting up automated processes for model fine-tuning and versioning.<br/><br/></p><p>- Work on optimizing cloud-based infrastructure to support the growth of GenAI workloads.<br/><br/><b>Required Skills : </b><br/><br/>- Strong experience with cloud platforms such as AWS Sagemaker, Google Vertex, or Azure AI.<br/><br/></p><p>- Proficiency in handling LLM systems, prompt fine-tuning, and versioning.<br/><br/></p><p>- Hands-on experience with infrastructure management, model deployment, and optimization.<br/><br/></p><p>- Strong understanding of cloud architecture, performance, and scalability for GenAI workloads.<br/><br/></p><p>- Proficiency in Python, SQL, and Bash scripting.<br/><br/></p><p>- Experience with machine learning frameworks such as Hugging Face, TensorFlow, PyTorch.<br/><br/></p><p>- Familiarity with CI/CD pipelines, Docker, Kubernetes, and MLOps workflows.<br/><br/></p><p>- Strong analytical skills and ability to troubleshoot complex infrastructure issues.<br/><br/><b>Nice to Have Skills : </b><br/><br/>- Familiarity with NLP frameworks and libraries such as Hugging Face, TensorFlow, PyTorch.<br/><br/></p><p>- Experience in working with large-scale data processing frameworks like Apache Spark, Hadoop.<br/><br/></p><p>- Knowledge of model explainability and interpretability techniques for LLMs.<br/><br/></p><p>- Familiarity with containerization technologies (e.g., Docker, Kubernetes) for model deployment and orchestration.<br/><br/></p><p>- Hands-on experience with MLOps pipelines.<br/><br/><b>Soft Skills : </b><br/><br/>- Strong communication skills to collaborate with cross-functional teams and stakeholders.<br/><br/></p><p>- Problem-solving mindset with an ability to quickly address infrastructure or performance issues.<br/><br/></p><p>- Attention to detail and a strong focus on quality and best practices.<br/><br/></p><p>- Ability to work in a fast-paced, dynamic environment with changing priorities.<br/><br/></p><p>- Strong analytical skills, with a focus on data-driven decision-making.<br/><br/><b>Tools & Technical Skills : </b><br/><br/>- Platforms : AWS Sagemaker, Google Vertex, Azure AI.<br/><br/></p><p>- Tools : Docker, Kubernetes, Terraform, Jenkins (CI/CD), MLflow.<br/><br/></p><p>- Languages : Python, SQL, Bash scripting.<br/><br/></p><p>- Frameworks : Hugging Face, TensorFlow, PyTorch, Keras.<br/><br/></p><p>- Databases : MySQL, PostgreSQL, NoSQL (MongoDB, Cassandra).<br/><br/></p><p>- Other : Git, GitHub, Jenkins, CloudFormation.<br/><br/><b>Education Qualification & Experience : </b><br/><br/>- Education : A degree in Data Science, Computer Science, Engineering, or a related field.
Masters degree is a plus.<br/><br/></p><p>- Certification : Certification in AWS, Azure, or Google Cloud is preferred.<br/><br/></p><p>- Experience : 4-6 years of relevant experience in AI/ML infrastructure management, with hands-on expertise in LLM systems, prompt versioning, and fine-tuning in a cloud environment.</p><br/></p> (ref:hirist.tech)