Job description
 
                           We're Hiring!  I'm excited to share that we're looking for SRE and DevOps - ML Framework to join our team at ITC Infotech.
Below is the JD for your reference.
Job Functions:   
● You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.
● You'll partner with vendors and the infrastructure engineering team for security and service availability 
● You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues 
● Diagnose and solve customer technical problems 
● Participate in training customers and prepare reports on customer issues 
● Be responsible for customer service improvements and recommend product improvements 
● Write support documentation 
● You'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%) 
● As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issues 
● Define engineering excellence for operational maturity 
● You'll work together with AI platform developers to provide the CI/CD model to deploy and configure the production system automatically 
● Develop and follow operational standard processes for tools and automation development.
Including: Style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standards 
● Deliver solutions that accelerate the activities, phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharing 
Required Skills:  
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton 
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments 
● Experience with AI/ML model training and inferencing platforms is a big plus 
● Experience with the LLM fine tuning system is a big plus 
● Debugging and triaging skills 
● Cloud technologies like Kubernetes, Docker and Linux fundamentals 
● Familiar with DevOps practices and continuous testing 
● DevOps pipeline and automations: app deployment/configuration & performance monitoring 
● Test automations, Jenkins CI/CD 
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams 
● Well organized and able to manage multiple projects in a fast paced and demanding environment 
● Good oral/reading/writing English ability.
Job Location: Bangalore 
If you're interested or know someone who might be a great fit, please reach out or apply 
 
                    
                    Required Skill Profession
 
                     
                    
                    It Services And It Consulting