Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role.

Get Resume Score

India Jobs Expertini

Principal Site Reliability Engineer, AI Infrastructure Job Opening In Bengaluru – Now Hiring NVIDIA

Principal Site Reliability Engineer, AI Infrastructure

Job description

NVIDIA is widely considered to be one of the technology world’s most desirable employers.

We have some of the most forward-thinking and hardworking people in the world working for us.

If you're creative and autonomous, we want to hear from you! NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for over 30 years.

It’s an outstanding legacy of innovation that’s fueled by phenomenal technology and exceptional people.

Today, we’re tapping into the unlimited potential of AI to define the next era of computing.

An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.

Doing what’s never been done before takes vision, innovation, and exceptional talent.

As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.

Come join the team and see how you can make a lasting impact on the world.

What You Will Be Doing:
+ Architect, lead, and scale globally distributed production systems supporting AI/ML, HPC, and critical engineering platforms across hybrid and multi-cloud environments.
+ Design and lead implementation of automation frameworks that reduce manual tasks, promote resilience, and uphold standard methodologies for system health, change safety, and release velocity.
+ Define and evolve platform-wide reliability metrics, capacity forecasting strategies, and uncertainty testing approaches for sophisticated distributed systems.
+ Lead cross-organizational efforts to assess operational maturity, address systemic risks, and establish long-term reliability strategies in collaboration with engineering, infrastructure, and product teams.
+ Pioneer initiatives that influence NVIDIA’s AI platform roadmap, participating in co-development efforts with internal partners and external vendors, and staying ahead of academic and industry advances.
+ Publish technical insights (papers, patents, whitepapers) and drive innovation in production engineering and system design.
+ Lead and mentor global teams in a technical capacity, participating in recruitment, design reviews, and developing standard methodologies in incident response, observability, and system architecture.

What We Need to See:
+ 15+ years of experience in SRE, Production Engineering, or Cloud Infrastructure, with a strong track record of leading platform-scale efforts and high-impact programs.
+ Deep expertise in Linux/Unix systems engineering and public/private cloud platforms (AWS, GCP, Azure, OCI).
+ Expert-level programming in Python and one or more languages such as C++, Go or Rust.
+ Demonstrated experience with Kubernetes at scale, CPU/GPU scheduling, microservice orchestration, and container lifecycle management in production.
+ Hands-on expertise in observability frameworks (Prometheus, Grafana, ELK, Loki, etc.) and Infrastructure as Code (Terraform, CDK, Pulumi).
+ Proficiency in Site Reliability Engineering concepts like error budgets, SLOs, distributed tracing, and architectural fault tolerance.
+ Ability to influence multi-functional collaborators and drive technical decisions through effective written and verbal communication.
+ Proven track record to complete long-term, forward-looking platform strategies.
+ Degree in Computer Science or related field, or equivalent experience

Ways to Stand Out from the Crowd:
+ Hands-on experience building platforms for large-scale AI training, inferencing, and data movement pipelines.
+ Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow, JAX) and orchestration frameworks (e.g., Ray, Kubeflow).
+ Expertise in hardware fleet observability, predictive failure analysis, and power/resource-aware scheduling.
+ Experience leading operational readiness efforts and reliability engineering in GPU-heavy environments.
+ Track record of driving cultural improvements in incident management, root cause analysis, and postmortem processes across large teams.

Join us and build the infrastructure that powers the world’s most advanced AI.

Apply now and make your mark at NVIDIA! Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.

Required Skill Profession

Other General

Job Details

Company: NVIDIA
Location: Bengaluru, IND, Bengaluru, India
Job Type: Full-time
Salary: Undisclosed
Posted: 2025-10-13
Status: (Active)
Download This Job Details in PDF
Why Work at NVIDIA
NVIDIA
Company Info
Other Openings
Industry: Other General

Related Jobs

Senior Site Reliability Engineer, AI Infrastructure

NVIDIA

Bengaluru, , India

Principal Engineer, Site Reliability

ANSR

Hyderabad, Telangana, India

Principal Engineer, Site Reliability

TMUS Global Solutions

Hyderabad, Telangana, India

Principal Site Reliability Engineer

Movius

Bengaluru, Karnataka, India

Principal site reliability engineer

Rakuten India

India

Principal Site Reliability Engineer

Commonwealth Bank

Bengaluru, Karnataka, India

Principal Site Reliability Engineer

JPMorgan Chase & Co.

Hyderabad, Telangana, India

Principal Site Reliability Engineer

Media.net

India

Principal Site Reliability Engineer

Amgen

Hyderabad, , India

Principal Site Reliability Engineer

Amgen

Hyderabad, , India

Principal Site Reliability Engineer

Confidential

India

Principal Site Reliability Engineer

Radware

Bengaluru, Karnataka, India

Principal Site Reliability Engineer

TMUS Global Solutions

Hyderabad, Telangana, India

Principal Site Reliability Engineer

Allegion

Bengaluru, Karnataka, India

Principal Infrastructure Reliability Engineer

Palo Alto Networks

Bengaluru, Karnataka, India

Site Reliability Engineer - Cloud Infrastructure

HyreSnap

Bengaluru, Karnataka, India

Site Reliability Engineer - Cloud Infrastructure

Indus face Private Limited

Bengaluru, Karnataka, India

Principal Site Reliability Enginee

Confidential

Pune, Maharashtra, India

Principal Site Reliability Developer

Oracle

Bengaluru, Karnataka, India

Principal Site Reliability Enginee

Confidential

India

Principal Site Reliability Developer

Oracle

BENGALURU, India, India

Principal Site Reliability Developer

Oracle

PUNE, India, India

Sr. Principal Site Reliability Engineer

Confidential

Bengaluru, Karnataka, India

Senior Principal Site Reliability Engineer

Bengaluru, Karnataka, India

Principal Site Reliability & Automation Engineer

TMUS Global Solutions

Hyderabad, Telangana, India

Senior Site Reliability Engineer, Infrastructure Observability

Palo Alto Networks

Bengaluru, Karnataka, India

Senior Site Reliability Engineer - Cloud Infrastructure

Wits Innovation Lab

Sahibzada Ajit Singh Nagar, Punjab, India

athenahealth - Site Reliability Engineer - Cloud Infrastructure

athenaHealth Technology Private Limited.

Bengaluru, Karnataka, India

Senior Site Reliability Engineer - Cloud Infrastructure

Wits Innovation Lab

Chandigarh, Chandigarh, India

Spotnana - Site Reliability Engineer - Cloud Infrastructure

Spotnana

Pune, Maharashtra, India

ThoughtSpot - Site Reliability Engineer - Cloud Infrastructure

THOUGHTSPOT INDIA PRIVATE LIMITED

India

Site Reliability Engineer

UPS

CHENNAI, , India

Unlock Your Principal Site Potential: Insight & Career Growth Guide

Real-time Principal Site Jobs Trends (Graphical Representation)

Explore profound insights with Expertini's real-time, in-depth analysis, showcased through the graph here. Uncover the dynamic job market trends for Principal Site in Bengaluru, India, highlighting market share and opportunities for professionals in Principal Site roles.

37617 Jobs in India

37617

4198 Jobs in Bengaluru

4198

Download Principal Site Jobs Trends in Bengaluru and India

Are You Looking for Principal Site Reliability Engineer, AI Infrastructure Job?

Great news! NVIDIA is currently hiring and seeking a Principal Site Reliability Engineer, AI Infrastructure to join their team. Feel free to download the job details.

Wait no longer! Are you also interested in exploring similar jobs? Search now: Principal Site Reliability Engineer, AI Infrastructure Jobs Bengaluru.

The Work Culture

An organization's rules and standards set how people should be treated in the office and how different situations should be handled. The work culture at NVIDIA adheres to the cultural norms as outlined by Expertini.

The fundamental ethical values are:

1. Independence

2. Loyalty

3. Impartiapty

4. Integrity

5. Accountabipty

6. Respect for human rights

7. Obeying India laws and regulations

What Is the Average Salary Range for Principal Site Reliability Engineer, AI Infrastructure Positions?

The average salary range for a Principal Site Reliability Engineer, AI Infrastructure Jobs India varies, but the pay scale is rated "Standard" in Bengaluru. Salary levels may vary depending on your industry, experience, and skills. It's essential to research and negotiate effectively. We advise reading the full job specification before proceeding with the application to understand the salary package.

What Are the Key Qualifications for Principal Site Reliability Engineer, AI Infrastructure?

Key qualifications for Principal Site Reliability Engineer, AI Infrastructure typically include Other General and a list of qualifications and expertise as mentioned in the job specification. The generic skills are mostly outlined by the Bureau of Labor. Be sure to check the specific job listing for detailed requirements and qualifications.

How Can I Improve My Chances of Getting Hired for Principal Site Reliability Engineer, AI Infrastructure?

To improve your chances of getting hired for Principal Site Reliability Engineer, AI Infrastructure, consider enhancing your skills. Check your CV/Résumé Score with our free Resume Scoring Tool. We have an in-built Resume Scoring tool that gives you the matching score for each job based on your CV/Résumé once it is uploaded. This can help you align your CV/Résumé according to the job requirements and enhance your skills if needed.

Interview Tips for Principal Site Reliability Engineer, AI Infrastructure Job Success

NVIDIA interview tips for Principal Site Reliability Engineer, AI Infrastructure

Here are some tips to help you prepare for and ace your Principal Site Reliability Engineer, AI Infrastructure job interview:

Before the Interview:

Research: Learn about the NVIDIA's mission, values, products, and the specific job requirements and get further information about

Other Openings

Practice: Prepare answers to common interview questions and rehearse using the STAR method (Situation, Task, Action, Result) to showcase your skills and experiences.

Dress Professionally: Choose attire appropriate for the company culture.

Prepare Questions: Show your interest by having thoughtful questions for the interviewer.

Plan Your Commute: Allow ample time to arrive on time and avoid feeling rushed.

During the Interview:

Be Punctual: Arrive on time to demonstrate professionalism and respect.

Make a Great First Impression: Greet the interviewer with a handshake, smile, and eye contact.

Confidence and Enthusiasm: Project a positive attitude and show your genuine interest in the opportunity.

Answer Thoughtfully: Listen carefully, take a moment to formulate clear and concise responses. Highlight relevant skills and experiences using the STAR method.

Ask Prepared Questions: Demonstrate curiosity and engagement with the role and company.

Follow Up: Send a thank-you email to the interviewer within 24 hours.

Additional Tips:

Be Yourself: Let your personality shine through while maintaining professionalism.

Be Honest: Don't exaggerate your skills or experience.

Be Positive: Focus on your strengths and accomplishments.

Body Language: Maintain good posture, avoid fidgeting, and make eye contact.

Turn Off Phone: Avoid distractions during the interview.

Final Thought:

To prepare for your Principal Site Reliability Engineer, AI Infrastructure interview at NVIDIA, research the company, understand the job requirements, and practice common interview questions.

Highlight your leadership skills, achievements, and strategic thinking abilities. Be prepared to discuss your experience with HR, including your approach to meeting targets as a team player. Additionally, review the NVIDIA's products or services and be prepared to discuss how you can contribute to their success.

By following these tips, you can increase your chances of making a positive impression and landing the job!

How to Set Up Job Alerts for Principal Site Reliability Engineer, AI Infrastructure Positions

Setting up job alerts for Principal Site Reliability Engineer, AI Infrastructure is easy with India Jobs Expertini. Simply visit our job alerts page here, enter your preferred job title and location, and choose how often you want to receive notifications. You'll get the latest job openings sent directly to your email for FREE!

Find Principal Site Reliability Jobs Nationwide