Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Principal Site Reliability Engineer, AI Infrastructure.
India Jobs Expertini

Urgent! Principal Site Reliability Engineer, AI Infrastructure Job Opening In Bengaluru – Now Hiring NVIDIA

Principal Site Reliability Engineer, AI Infrastructure



Job description

NVIDIA is widely considered to be one of the technology world’s most desirable employers.

We have some of the most forward-thinking and hardworking people in the world working for us.

If you're creative and autonomous, we want to hear from you! NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for over 30 years.

It’s an outstanding legacy of innovation that’s fueled by phenomenal technology and exceptional people.

Today, we’re tapping into the unlimited potential of AI to define the next era of computing.

An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.

Doing what’s never been done before takes vision, innovation, and exceptional talent.

As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.

Come join the team and see how you can make a lasting impact on the world.



What You Will Be Doing:
+ Architect, lead, and scale globally distributed production systems supporting AI/ML, HPC, and critical engineering platforms across hybrid and multi-cloud environments.
+ Design and lead implementation of automation frameworks that reduce manual tasks, promote resilience, and uphold standard methodologies for system health, change safety, and release velocity.
+ Define and evolve platform-wide reliability metrics, capacity forecasting strategies, and uncertainty testing approaches for sophisticated distributed systems.
+ Lead cross-organizational efforts to assess operational maturity, address systemic risks, and establish long-term reliability strategies in collaboration with engineering, infrastructure, and product teams.
+ Pioneer initiatives that influence NVIDIA’s AI platform roadmap, participating in co-development efforts with internal partners and external vendors, and staying ahead of academic and industry advances.
+ Publish technical insights (papers, patents, whitepapers) and drive innovation in production engineering and system design.
+ Lead and mentor global teams in a technical capacity, participating in recruitment, design reviews, and developing standard methodologies in incident response, observability, and system architecture.



What We Need to See:
+ 15+ years of experience in SRE, Production Engineering, or Cloud Infrastructure, with a strong track record of leading platform-scale efforts and high-impact programs.
+ Deep expertise in Linux/Unix systems engineering and public/private cloud platforms (AWS, GCP, Azure, OCI).
+ Expert-level programming in Python and one or more languages such as C++, Go or Rust.
+ Demonstrated experience with Kubernetes at scale, CPU/GPU scheduling, microservice orchestration, and container lifecycle management in production.
+ Hands-on expertise in observability frameworks (Prometheus, Grafana, ELK, Loki, etc.) and Infrastructure as Code (Terraform, CDK, Pulumi).
+ Proficiency in Site Reliability Engineering concepts like error budgets, SLOs, distributed tracing, and architectural fault tolerance.
+ Ability to influence multi-functional collaborators and drive technical decisions through effective written and verbal communication.
+ Proven track record to complete long-term, forward-looking platform strategies.
+ Degree in Computer Science or related field, or equivalent experience



Ways to Stand Out from the Crowd:
+ Hands-on experience building platforms for large-scale AI training, inferencing, and data movement pipelines.
+ Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow, JAX) and orchestration frameworks (e.g., Ray, Kubeflow).
+ Expertise in hardware fleet observability, predictive failure analysis, and power/resource-aware scheduling.
+ Experience leading operational readiness efforts and reliability engineering in GPU-heavy environments.
+ Track record of driving cultural improvements in incident management, root cause analysis, and postmortem processes across large teams.



Join us and build the infrastructure that powers the world’s most advanced AI.

Apply now and make your mark at NVIDIA! Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.






Required Skill Profession

Other General



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Principal Site Potential: Insight & Career Growth Guide


  • Real-time Principal Site Jobs Trends in Bengaluru, India (Graphical Representation)

    Explore profound insights with Expertini's real-time, in-depth analysis, showcased through the graph below. This graph displays the job market trends for Principal Site in Bengaluru, India using a bar chart to represent the number of jobs available and a trend line to illustrate the trend over time. Specifically, the graph shows 37617 jobs in India and 4198 jobs in Bengaluru. This comprehensive analysis highlights market share and opportunities for professionals in Principal Site roles. These dynamic trends provide a better understanding of the job market landscape in these regions.

  • Are You Looking for Principal Site Reliability Engineer, AI Infrastructure Job?

    Great news! is currently hiring and seeking a Principal Site Reliability Engineer, AI Infrastructure to join their team. Feel free to download the job details.

    Wait no longer! Are you also interested in exploring similar jobs? Search now: .

  • The Work Culture

    An organization's rules and standards set how people should be treated in the office and how different situations should be handled. The work culture at NVIDIA adheres to the cultural norms as outlined by Expertini.

    The fundamental ethical values are:
    • 1. Independence
    • 2. Loyalty
    • 3. Impartiality
    • 4. Integrity
    • 5. Accountability
    • 6. Respect for human rights
    • 7. Obeying India laws and regulations
  • What Is the Average Salary Range for Principal Site Reliability Engineer, AI Infrastructure Positions?

    The average salary range for a varies, but the pay scale is rated "Standard" in Bengaluru. Salary levels may vary depending on your industry, experience, and skills. It's essential to research and negotiate effectively. We advise reading the full job specification before proceeding with the application to understand the salary package.

  • What Are the Key Qualifications for Principal Site Reliability Engineer, AI Infrastructure?

    Key qualifications for Principal Site Reliability Engineer, AI Infrastructure typically include Other General and a list of qualifications and expertise as mentioned in the job specification. Be sure to check the specific job listing for detailed requirements and qualifications.

  • How Can I Improve My Chances of Getting Hired for Principal Site Reliability Engineer, AI Infrastructure?

    To improve your chances of getting hired for Principal Site Reliability Engineer, AI Infrastructure, consider enhancing your skills. Check your CV/Résumé Score with our free Tool. We have an in-built Resume Scoring tool that gives you the matching score for each job based on your CV/Résumé once it is uploaded. This can help you align your CV/Résumé according to the job requirements and enhance your skills if needed.

  • Interview Tips for Principal Site Reliability Engineer, AI Infrastructure Job Success
    NVIDIA interview tips for Principal Site Reliability Engineer, AI Infrastructure

    Here are some tips to help you prepare for and ace your job interview:

    Before the Interview:
    • Research: Learn about the NVIDIA's mission, values, products, and the specific job requirements and get further information about
    • Other Openings
    • Practice: Prepare answers to common interview questions and rehearse using the STAR method (Situation, Task, Action, Result) to showcase your skills and experiences.
    • Dress Professionally: Choose attire appropriate for the company culture.
    • Prepare Questions: Show your interest by having thoughtful questions for the interviewer.
    • Plan Your Commute: Allow ample time to arrive on time and avoid feeling rushed.
    During the Interview:
    • Be Punctual: Arrive on time to demonstrate professionalism and respect.
    • Make a Great First Impression: Greet the interviewer with a handshake, smile, and eye contact.
    • Confidence and Enthusiasm: Project a positive attitude and show your genuine interest in the opportunity.
    • Answer Thoughtfully: Listen carefully, take a moment to formulate clear and concise responses. Highlight relevant skills and experiences using the STAR method.
    • Ask Prepared Questions: Demonstrate curiosity and engagement with the role and company.
    • Follow Up: Send a thank-you email to the interviewer within 24 hours.
    Additional Tips:
    • Be Yourself: Let your personality shine through while maintaining professionalism.
    • Be Honest: Don't exaggerate your skills or experience.
    • Be Positive: Focus on your strengths and accomplishments.
    • Body Language: Maintain good posture, avoid fidgeting, and make eye contact.
    • Turn Off Phone: Avoid distractions during the interview.
    Final Thought:

    To prepare for your Principal Site Reliability Engineer, AI Infrastructure interview at NVIDIA, research the company, understand the job requirements, and practice common interview questions.

    Highlight your leadership skills, achievements, and strategic thinking abilities. Be prepared to discuss your experience with HR, including your approach to meeting targets as a team player. Additionally, review the NVIDIA's products or services and be prepared to discuss how you can contribute to their success.

    By following these tips, you can increase your chances of making a positive impression and landing the job!

  • How to Set Up Job Alerts for Principal Site Reliability Engineer, AI Infrastructure Positions

    Setting up job alerts for Principal Site Reliability Engineer, AI Infrastructure is easy with India Jobs Expertini. Simply visit our job alerts page here, enter your preferred job title and location, and choose how often you want to receive notifications. You'll get the latest job openings sent directly to your email for FREE!