Job description
SRE DevOps(ML Ops role)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton.
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fine tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
SRE DevOps (Big Data Role)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python.
● Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
● Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
● Experience with AI/ML model training and inferencing platforms is a big plus.
● Experience with the LLM fine tuning system is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
SRE DevOps(ML Flow)
Required Skills:
● Demonstrated ability in designing, building, refactoring and releasing software written in Python, C++.
● Hands-on experience with Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting.
● Ability to use Ray Dashboard and CLI tools for monitoring, resource tracking, debugging distributed jobs, and resolving production issues.
● Having knowledge of Ray ecosystem libraries such as Ray Train, Ray Tune, Ray Serve, and Ray Data is a big plus.
● Experience integrating Ray with tools such as Airflow, MLflow, Dask, DeepSpeed is a big plus.
● Debugging and triaging skills.
● Cloud technologies like Kubernetes, Docker and Linux fundamentals.
● Familiar with DevOps practices and continuous testing.
● DevOps pipeline and automations: app deployment/configuration & performance monitoring.
● Test automations, Jenkins CI/CD.
● Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
● Well organized and able to manage multiple projects in a fast paced and demanding environment.
● Good oral/reading/writing English ability
Required Skill Profession
Computer Occupations