Job description
 
                         Key Responsibilities  
  
Design, build, and optimize high-performance microservices using Java 17+, Spring Boot, and reactive frameworks.
 
Develop and maintain APIs for model registration, inference request routing, and model version management.
 
Integrate with Seldon Core / KServe for model orchestration, deployment, A/B testing and shadow testing of ML models.
 
Implement asynchronous, event-driven architectures using Kafka, Pub/Sub, or RabbitMQ for high-throughput inference workloads.
 
Build monitoring, tracing, and observability into inference pipelines using Prometheus, Grafana, and OpenTelemetry.
 
Work closely with MLOps and data engineering teams to integrate CI/CD for ML models, manage containers, and deploy to Kubernetes / cloud-native platforms (GCP GKE, AWS EKS, or Azure AKS).
 
Collaborate with data scientists and architects to ensure efficient model packaging, feature ingestion, and inference scaling.
 
Apply performance tuning, caching strategies (Redis, Hazelcast), and load balancing to sustain millisecond-level response times.
 
Mentor engineers and participate in design reviews to ensure architectural consistency, quality, and scalability.
 
  
  
Required Qualifications  
  
12+ years of backend engineering experience with Java HPC (17 or above) in large-scale, distributed systems.
 
Deep understanding of microservices, event-driven architectures, and reactive programming.
 
Proven expertise in Spring Boot, Kafka, REST/gRPC APIs, JPA/Hibernate, Redis, and SQL/NoSQL databases.
 
Strong understanding of containerization (Docker) and Kubernetes orchestration.
 
Hands-on experience integrating or developing AI/ML inference platforms such as Seldon, KServe, TensorFlow Serving, or TorchServe.
 
Familiarity with model lifecycle management, feature stores, and inference request optimization.
 
Experience with cloud environments (AWS, GCP, or Azure) and CI/CD pipelines (Jenkins, ArgoCD, GitLab CI).
 
Knowledge of observability and performance tuning in high-concurrency, low-latency applications.
 
Strong communication, analytical, and problem-solving skills with the ability to lead technical discussions across teams.
 
  
  
Preferred Skills  
  
Exposure to Python-based ML pipelines and REST/gRPC integrations with data science environments.
 
Familiarity with model governance, A/B testing, and multi-model serving architectures.
 
Experience with API gateways (Kong, Istio, or Envoy) and service mesh architectures.
 
Understanding of data serialization formats (Avro, Protobuf, Parquet) and streaming analytics.
 
Knowledge of GPU/CPU optimization for inference workloads  
Exp Range: 10 TO 15  
Location: Pan India  
Interview Type: Weekday Virtual Drive  
Date: 16-Oct-2025  
Day: Thursday  
 
                    
                    Required Skill Profession
 
                     
                    
                    It