Ready to Apply?
            
                Take the Next Step in Your Career
                Join Brillio and advance your career in PRB
             
            Apply for This Position
            
                Click the button above to apply on our website
            
         
        
            Job Description
            
                SRE DevOps(ML Ops role)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton.
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fine tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
SRE DevOps (Big Data Role)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fine tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
SRE DevOps(ML Flow)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python, C++.
● Hands-on experience with Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting.
● Ability to use Ray Dashboard and CLI tools for monitoring, resource tracking, debugging distributed jobs, and resolving production issues.
● Having knowledge of Ray ecosystem libraries such as Ray Train, Ray Tune, Ray Serve, and Ray Data is a big plus.
● Experience integrating Ray with tools such as Airflow, MLflow, Dask, DeepSpeed is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
            
         
  
  
  
        
        
        
        
        
            Don't Miss This Opportunity!
            
                Brillio is actively hiring for this SRE DevOps Engineer position
            
            Apply Now