Job description
We're Hiring! I'm excited to share that we're looking for SRE and DevOps - ML Framework to join our team at ITC Infotech.
Below is the JD for your reference.
Job Functions:
● You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.
● You'll partner with vendors and the infrastructure engineering team for security and service availability
● You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues
● Diagnose and solve customer technical problems
● Participate in training customers and prepare reports on customer issues
● Be responsible for customer service improvements and recommend product improvements
● Write support documentation
● You'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%)
● As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issues
● Define engineering excellence for operational maturity
● You'll work together with AI platform developers to provide the CI/CD model to deploy and configure the production system automatically
● Develop and follow operational standard processes for tools and automation development.
Including: Style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standards
● Deliver solutions that accelerate the activities, phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharing
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments
● Experience with AI/ML model training and inferencing platforms is a big plus
● Experience with the LLM fine tuning system is a big plus
● Debugging and triaging skills
● Cloud technologies like Kubernetes, Docker and Linux fundamentals
● Familiar with DevOps practices and continuous testing
● DevOps pipeline and automations: app deployment/configuration & performance monitoring
● Test automations, Jenkins CI/CD
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams
● Well organized and able to manage multiple projects in a fast paced and demanding environment
● Good oral/reading/writing English ability.
Job Location: Bangalore
If you're interested or know someone who might be a great fit, please reach out or apply
Required Skill Profession
Other General