SRE & DevOps (ML Framework) - AI Platform
Location : Bangalore
Mode: Hybrid
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton.
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fi ne tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/confi guration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability.