Job Overview
Category
Computer Occupations
Ready to Apply?
Take the Next Step in Your Career
Join greytHR and advance your career in Computer Occupations
Apply for This Position
Click the button above to apply on our website
Job Description
About the Role
We are looking for a passionate and detail-oriented Site Reliability Engineer (SRE) to join our engineering team.
As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services.
You’ll work closely with development and QA teams to build, maintain, and scale production systems while implementing best practices for monitoring, automation, and incident management.
This position is ideal for engineers who thrive in complex distributed environments, are strong in Databases , Kubernetes , and enjoy improving system reliability through automation and modern tooling.
Key Responsibilities
Infrastructure Reliability & Performance
Maintain, monitor, and improve uptime and performance of production systems.
Design and implement scalable, reliable, and secure infrastructure on cloud platforms (AWS / GCP).
Kubernetes & Containerization
Deploy, manage, and optimize containerized workloads using Kubernetes and Helm.
Troubleshoot Kubernetes clusters, pods, and networking issues.
Manage CI/CD pipelines integrated with Kubernetes-based deployments.
Database Administration
Manage and optimize databases (PostgreSQL, MongoDB, or other DBs).
Perform database tuning, backups, restores, and replication management.
Automate DB monitoring and implement high availability (HA) strategies
Monitoring & Incident Response
Participate in on-call rotations for production support and incident response.
Conduct post-incident reviews and drive preventive improvements.
Security & Compliance
Implement and enforce security best practices in infrastructure and application deployments.
Manage access controls, secrets, and network policies in production environments.
Collaboration & Continuous Improvement
Work with development teams to design systems with reliability and scalability in mind.
Drive automation and self-healing capabilities for common operational tasks.
Contribute to SRE playbooks, runbooks, and documentation.
Required Skills & Qualifications
Education: Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
Experience: 2–5 years of experience as an SRE / DevOps / DBA
Core Skills:
Strong experience with Kubernetes , Docker, and container orchestration.
Hands-on experience with Databases (MySQL, PostgreSQL, MongoDB, or similar).
Proficiency in Linux system administration and shell scripting .
Good knowledge of cloud platforms (AWS / GCP / Azure) and related services.
Basic understanding of networking concepts (DNS, Load Balancing, Firewalls, etc.).
Programming experience in Python , Go , or Bash for automation.
Don't Miss This Opportunity!
greytHR is actively hiring for this Site Reliability Engineer (SRE II) position
Apply Now