Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: GPU Infrastructure & Data Center Engineer.
India Jobs Expertini

Urgent! GPU Infrastructure & Data Center Engineer Job Opening In Hyderabad – Now Hiring PhoQtek labs

GPU Infrastructure & Data Center Engineer



Job description

About the RoleWe are seeking a highly skilled IT Solutions & GPU Infrastructure Lead to take complete ownership of our GPU-based server infrastructure.

This role focuses on next-generation GPU systems used for AI/ML workloads, covering every aspect from data center colocation and setup to GPU slicing, MIG management, resource allocation, optimization, and compliance.

You will lead the end-to-end lifecycle of GPU infrastructure — ensuring all servers are optimized, secure, and production-ready for both internal and customer use.Key ResponsibilitiesColocation & Infrastructure SetupGPU colocation and end-to-end infrastructure setup will be entirely under your ownership and responsibility.Coordinate with data centers for rack installation, power, and cooling.Deploy and configure GPU-based servers for production readiness.2. GPU & AI/ML InfrastructureManage GPU slicing and MIG (Multi-Instance GPU) for multi-tenant workloads.Install and maintain the NVIDIA software stack — CUDA, cuDNN, NCCL, and DCGM.Optimize GPU infrastructure for AI/ML workloads (TensorFlow, PyTorch, RAPIDS).Support multi-GPU scaling using NVLink and PCIe passthrough.3. Systems & VirtualizationAdminister Linux-based environments (Ubuntu, CentOS, Rocky) along with other environments.Manage virtualization platforms such as VMware, KVM, or Proxmox with GPU passthrough.Handle container orchestration with Docker and Kubernetes GPU Operators.Integrate high-performance storage (NFS, Ceph, SAN/NAS) for large-scale datasets.4. Monitoring & Performance OptimizationMonitor GPU and system performance using Prometheus, Grafana, NVIDIA DCGM, and nvidia-smi.Proactively detect, analyze, and resolve GPU or system bottlenecks.Optimize GPU nodes for training and inference performance.Implement structured logging, alerts, and usage reporting.one should have to administer, manage, monitor and maintain GPU infrastructure for AI workloads.5. Security & ComplianceHarden GPU servers for multi-tenant workloads.Manage driver, firmware, and software license compliance.Ensure infrastructure security and audit readiness with periodic patching and updates.6. Networking & High-Performance I/OConfigure and maintain high-speed network fabrics (InfiniBand, RDMA, RoCE).Optimize low-latency interconnects for distributed GPU workloads.Troubleshoot and enhance data


Required Skill Profession

Defense And Space Manufacturing



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your GPU Infrastructure Potential: Insight & Career Growth Guide