About the Role
We are looking for a highly skilled Backend Engineer with a strong background in Python, system design, and infrastructure to join our team.
You will be responsible for designing, building, and maintaining scalable backend systems, while collaborating with cross-functional teams to deliver robust and efficient solutions.
This role requires someone who can think end-to-end , from designing high-level architecture, implementing core services, to ensuring production-grade reliability and performance.
Key Responsibilities
- Develop and maintain backend services and APIs using Python & NodeJs
- Design scalable, resilient, and maintainable systems, focusing on system architecture and distributed systems .
- Integrate AI and large language models (LLMs) into applications, ensuring performance, scalability, and cost-efficiency.
- Collaborate with AI/ML teams to deploy models into production pipelines.
- Optimize infrastructure for AI workloads (GPU usage, caching, batch processing)
- Build and maintain monitoring, logging, and observability for AI-powered systems.
- Troubleshoot and resolve issues in production systems while maintaining high reliability.
- Participate in design and code reviews, and drive engineering best practices across the team.
- Automate deployment pipelines for backend and AI services (CI/CD, IaC).
Required Skills & Qualifications
- Strong experience in Python (FastAPI (most-preferred), Flask, Django, or similar) or NodeJS (Express (most-preferred), Fastify or similar)
- Solid understanding of system design principles : scalability, fault tolerance, distributed systems.
- Experience with infrastructure and DevOps : Docker, Kubernetes, Terraform, CI/CD pipelines.
- Hands-on experience with cloud platforms (AWS, Azure, GCP), especially for AI workloads.
- Knowledge of databases (SQL & NoSQL) and caching systems (Redis, Memcached).
- Experience integrating LLMs or AI APIs into production systems (OpenAI, HuggingFace, LangChain, etc.).
- Familiarity with messaging/streaming systems (Kafka, RabbitMQ).
- Monitoring and observability experience (Prometheus, Grafana, ELK).
- Strong problem-solving, debugging, and analytical skills.
- Excellent communication and collaboration skills.
Nice to Have
- Experience with generative AI pipelines , vector databases, and embeddings.
- Familiarity with ML Ops tools (MLflow, BentoML, Ray Serve, etc.).
- Knowledge of event-driven architectures and microservices.
- Prior experience in AI/LLM-focused startups or high-scale AI systems .
What We Offer
- Opportunity to work on challenging, large-scale systems with real-world impact.
- Collaborative team culture with focus on learning and innovation .
- Competitive compensation and growth opportunities.