Job Overview
Category
Computer Occupations
Ready to Apply?
Take the Next Step in Your Career
Join Brillio and advance your career in Computer Occupations
Apply for This Position
Click the button above to apply on our website
Job Description
SRE DevOps(ML Ops role)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton.
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fine tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
SRE DevOps (Big Data Role)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fine tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
SRE DevOps(ML Flow)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python, C++.
● Hands-on experience with Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting.
● Ability to use Ray Dashboard and CLI tools for monitoring, resource tracking, debugging distributed jobs, and resolving production issues.
● Having knowledge of Ray ecosystem libraries such as Ray Train, Ray Tune, Ray Serve, and Ray Data is a big plus.
● Experience integrating Ray with tools such as Airflow, MLflow, Dask, DeepSpeed is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
Don't Miss This Opportunity!
Brillio is actively hiring for this SRE DevOps Engineer position
Apply Now