• Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role.
India Jobs Expertini

Principal Site Reliability Engineer, AI Infrastructure Job Opening In Bengaluru – Now Hiring NVIDIA


Job description

NVIDIA is widely considered to be one of the technology world’s most desirable employers.

We have some of the most forward-thinking and hardworking people in the world working for us.

If you're creative and autonomous, we want to hear from you! NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for over 30 years.

It’s an outstanding legacy of innovation that’s fueled by phenomenal technology and exceptional people.

Today, we’re tapping into the unlimited potential of AI to define the next era of computing.

An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.

Doing what’s never been done before takes vision, innovation, and exceptional talent.

As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.

Come join the team and see how you can make a lasting impact on the world.



What You Will Be Doing:
+ Architect, lead, and scale globally distributed production systems supporting AI/ML, HPC, and critical engineering platforms across hybrid and multi-cloud environments.
+ Design and lead implementation of automation frameworks that reduce manual tasks, promote resilience, and uphold standard methodologies for system health, change safety, and release velocity.
+ Define and evolve platform-wide reliability metrics, capacity forecasting strategies, and uncertainty testing approaches for sophisticated distributed systems.
+ Lead cross-organizational efforts to assess operational maturity, address systemic risks, and establish long-term reliability strategies in collaboration with engineering, infrastructure, and product teams.
+ Pioneer initiatives that influence NVIDIA’s AI platform roadmap, participating in co-development efforts with internal partners and external vendors, and staying ahead of academic and industry advances.
+ Publish technical insights (papers, patents, whitepapers) and drive innovation in production engineering and system design.
+ Lead and mentor global teams in a technical capacity, participating in recruitment, design reviews, and developing standard methodologies in incident response, observability, and system architecture.



What We Need to See:
+ 15+ years of experience in SRE, Production Engineering, or Cloud Infrastructure, with a strong track record of leading platform-scale efforts and high-impact programs.
+ Deep expertise in Linux/Unix systems engineering and public/private cloud platforms (AWS, GCP, Azure, OCI).
+ Expert-level programming in Python and one or more languages such as C++, Go or Rust.
+ Demonstrated experience with Kubernetes at scale, CPU/GPU scheduling, microservice orchestration, and container lifecycle management in production.
+ Hands-on expertise in observability frameworks (Prometheus, Grafana, ELK, Loki, etc.) and Infrastructure as Code (Terraform, CDK, Pulumi).
+ Proficiency in Site Reliability Engineering concepts like error budgets, SLOs, distributed tracing, and architectural fault tolerance.
+ Ability to influence multi-functional collaborators and drive technical decisions through effective written and verbal communication.
+ Proven track record to complete long-term, forward-looking platform strategies.
+ Degree in Computer Science or related field, or equivalent experience



Ways to Stand Out from the Crowd:
+ Hands-on experience building platforms for large-scale AI training, inferencing, and data movement pipelines.
+ Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow, JAX) and orchestration frameworks (e.g., Ray, Kubeflow).
+ Expertise in hardware fleet observability, predictive failure analysis, and power/resource-aware scheduling.
+ Experience leading operational readiness efforts and reliability engineering in GPU-heavy environments.
+ Track record of driving cultural improvements in incident management, root cause analysis, and postmortem processes across large teams.



Join us and build the infrastructure that powers the world’s most advanced AI.

Apply now and make your mark at NVIDIA! Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.





Required Skill Profession

Other General


  • Job Details

Related Jobs

ANSR hiring Principal Engineer, Site Reliability Job in Hyderabad, Telangana, India
ANSR
Hyderabad, Telangana, India
TMUS Global Solutions hiring Principal Engineer, Site Reliability Job in Hyderabad, Telangana, India
TMUS Global Solutions
Hyderabad, Telangana, India
Movius hiring Principal Site Reliability Engineer Job in Bengaluru, Karnataka, India
Movius
Bengaluru, Karnataka, India
Commonwealth Bank hiring Principal Site Reliability Engineer Job in Bengaluru, Karnataka, India
Commonwealth Bank
Bengaluru, Karnataka, India
JPMorgan Chase & Co. hiring Principal Site Reliability Engineer Job in Hyderabad, Telangana, India
JPMorgan Chase & Co.
Hyderabad, Telangana, India
Amgen hiring Principal Site Reliability Engineer Job in Hyderabad, , India
Amgen
Hyderabad, , India
Amgen hiring Principal Site Reliability Engineer Job in Hyderabad, , India
Amgen
Hyderabad, , India
Radware hiring Principal Site Reliability Engineer Job in Bengaluru, Karnataka, India
Radware
Bengaluru, Karnataka, India
TMUS Global Solutions hiring Principal Site Reliability Engineer Job in Hyderabad, Telangana, India
TMUS Global Solutions
Hyderabad, Telangana, India
Allegion hiring Principal Site Reliability Engineer Job in Bengaluru, Karnataka, India
Allegion
Bengaluru, Karnataka, India
Palo Alto Networks hiring Principal Infrastructure Reliability Engineer Job in Bengaluru, Karnataka, India
Palo Alto Networks
Bengaluru, Karnataka, India
HyreSnap hiring Site Reliability Engineer - Cloud Infrastructure Job in Bengaluru, Karnataka, India
HyreSnap
Bengaluru, Karnataka, India
Indus face Private Limited hiring Site Reliability Engineer - Cloud Infrastructure Job in Bengaluru, Karnataka, India
Indus face Private Limited
Bengaluru, Karnataka, India
Confidential hiring Principal Site Reliability Enginee Job in Pune, Maharashtra, India
Confidential
Pune, Maharashtra, India
Oracle hiring Principal Site Reliability Developer Job in Bengaluru, Karnataka, India
Oracle
Bengaluru, Karnataka, India
Oracle hiring Principal Site Reliability Developer Job in BENGALURU, India, India
Oracle
BENGALURU, India, India
Oracle hiring Principal Site Reliability Developer Job in PUNE, India, India
Oracle
PUNE, India, India
Confidential hiring Sr. Principal Site Reliability Engineer Job in Bengaluru, Karnataka, India
Confidential
Bengaluru, Karnataka, India
F5 hiring Senior Principal Site Reliability Engineer Job in Bengaluru, Karnataka, India
F5
Bengaluru, Karnataka, India
TMUS Global Solutions hiring Principal Site Reliability & Automation Engineer Job in Hyderabad, Telangana, India
TMUS Global Solutions
Hyderabad, Telangana, India
Palo Alto Networks hiring Senior Site Reliability Engineer, Infrastructure Observability Job in Bengaluru, Karnataka, India
Palo Alto Networks
Bengaluru, Karnataka, India
Wits Innovation Lab hiring Senior Site Reliability Engineer - Cloud Infrastructure Job in Sahibzada Ajit Singh Nagar, Punjab, India
Wits Innovation Lab
Sahibzada Ajit Singh Nagar, Punjab, India
athenaHealth Technology Private Limited. hiring athenahealth - Site Reliability Engineer - Cloud Infrastructure Job in Bengaluru, Karnataka, India
athenaHealth Technology Private Limited.
Bengaluru, Karnataka, India
Wits Innovation Lab hiring Senior Site Reliability Engineer - Cloud Infrastructure Job in Chandigarh, Chandigarh, India
Wits Innovation Lab
Chandigarh, Chandigarh, India
THOUGHTSPOT INDIA PRIVATE LIMITED hiring ThoughtSpot - Site Reliability Engineer - Cloud Infrastructure Job in India
THOUGHTSPOT INDIA PRIVATE LIMITED
India
UPS hiring Site Reliability Engineer Job in CHENNAI, , India
UPS
CHENNAI, , India

Unlock Your Principal Site Potential: Insight & Career Growth Guide


Real-time Principal Site Jobs Trends (Graphical Representation)

Explore profound insights with Expertini's real-time, in-depth analysis, showcased through the graph here. Uncover the dynamic job market trends for Principal Site in Bengaluru, India, highlighting market share and opportunities for professionals in Principal Site roles.

37617 Jobs in India
37617
4198 Jobs in Bengaluru
4198
Download Principal Site Jobs Trends in Bengaluru and India

Are You Looking for Principal Site Reliability Engineer, AI Infrastructure Job?

Great news! is currently hiring and seeking a Principal Site Reliability Engineer, AI Infrastructure to join their team. Feel free to download the job details.

Wait no longer! Are you also interested in exploring similar jobs? Search now: .

The Work Culture

An organization's rules and standards set how people should be treated in the office and how different situations should be handled. The work culture at NVIDIA adheres to the cultural norms as outlined by Expertini.

The fundamental ethical values are:

1. Independence

2. Loyalty

3. Impartiapty

4. Integrity

5. Accountabipty

6. Respect for human rights

7. Obeying India laws and regulations

What Is the Average Salary Range for Principal Site Reliability Engineer, AI Infrastructure Positions?

The average salary range for a varies, but the pay scale is rated "Standard" in Bengaluru. Salary levels may vary depending on your industry, experience, and skills. It's essential to research and negotiate effectively. We advise reading the full job specification before proceeding with the application to understand the salary package.

What Are the Key Qualifications for Principal Site Reliability Engineer, AI Infrastructure?

Key qualifications for Principal Site Reliability Engineer, AI Infrastructure typically include Other General and a list of qualifications and expertise as mentioned in the job specification. The generic skills are mostly outlined by the . Be sure to check the specific job listing for detailed requirements and qualifications.

How Can I Improve My Chances of Getting Hired for Principal Site Reliability Engineer, AI Infrastructure?

To improve your chances of getting hired for Principal Site Reliability Engineer, AI Infrastructure, consider enhancing your skills. Check your CV/Résumé Score with our free Tool. We have an in-built Resume Scoring tool that gives you the matching score for each job based on your CV/Résumé once it is uploaded. This can help you align your CV/Résumé according to the job requirements and enhance your skills if needed.

Interview Tips for Principal Site Reliability Engineer, AI Infrastructure Job Success

NVIDIA interview tips for Principal Site Reliability Engineer, AI Infrastructure

Here are some tips to help you prepare for and ace your Principal Site Reliability Engineer, AI Infrastructure job interview:

Before the Interview:

Research: Learn about the NVIDIA's mission, values, products, and the specific job requirements and get further information about

Other Openings

Practice: Prepare answers to common interview questions and rehearse using the STAR method (Situation, Task, Action, Result) to showcase your skills and experiences.

Dress Professionally: Choose attire appropriate for the company culture.

Prepare Questions: Show your interest by having thoughtful questions for the interviewer.

Plan Your Commute: Allow ample time to arrive on time and avoid feeling rushed.

During the Interview:

Be Punctual: Arrive on time to demonstrate professionalism and respect.

Make a Great First Impression: Greet the interviewer with a handshake, smile, and eye contact.

Confidence and Enthusiasm: Project a positive attitude and show your genuine interest in the opportunity.

Answer Thoughtfully: Listen carefully, take a moment to formulate clear and concise responses. Highlight relevant skills and experiences using the STAR method.

Ask Prepared Questions: Demonstrate curiosity and engagement with the role and company.

Follow Up: Send a thank-you email to the interviewer within 24 hours.

Additional Tips:

Be Yourself: Let your personality shine through while maintaining professionalism.

Be Honest: Don't exaggerate your skills or experience.

Be Positive: Focus on your strengths and accomplishments.

Body Language: Maintain good posture, avoid fidgeting, and make eye contact.

Turn Off Phone: Avoid distractions during the interview.

Final Thought:

To prepare for your Principal Site Reliability Engineer, AI Infrastructure interview at NVIDIA, research the company, understand the job requirements, and practice common interview questions.

Highlight your leadership skills, achievements, and strategic thinking abilities. Be prepared to discuss your experience with HR, including your approach to meeting targets as a team player. Additionally, review the NVIDIA's products or services and be prepared to discuss how you can contribute to their success.

By following these tips, you can increase your chances of making a positive impression and landing the job!

How to Set Up Job Alerts for Principal Site Reliability Engineer, AI Infrastructure Positions

Setting up job alerts for Principal Site Reliability Engineer, AI Infrastructure is easy with India Jobs Expertini. Simply visit our job alerts page here, enter your preferred job title and location, and choose how often you want to receive notifications. You'll get the latest job openings sent directly to your email for FREE!