Job Description
About the RoleWe are looking for a highly skilled Backend Engineer with a strong background in Python, system design, and infrastructure to join our team.
You will be responsible for designing, building, and maintaining scalable backend systems, while collaborating with cross-functional teams to deliver robust and efficient solutions.This role requires someone who can think end-to-end, from designing high-level architecture, implementing core services, to ensuring production-grade reliability and performance.Key Responsibilities- Develop and maintain backend services and APIs using Python & NodeJs- Design scalable, resilient, and maintainable systems, focusing on system architecture and distributed systems.- Integrate AI and large language models (LLMs) into applications, ensuring performance, scalability, and cost-efficiency.- Collaborate with AI/ML teams to deploy models into production pipelines.- Optimize infrastructure for AI workloads (GPU usage, caching, batch processing)- Build and maintain monitoring, logging, and observability for AI-powered systems.- Troubleshoot and resolve issues in production systems while maintaining high reliability.- Participate in design and code reviews, and drive engineering best practices across the team.- Automate deployment pipelines for backend and AI services (CI/CD, IaC).Required Skills & Qualifications- Strong experience in Python (FastAPI (most-preferred), Flask, Django, or similar) or NodeJS (Express (most-preferred), Fastify or similar)- Solid understanding of system design principles: scalability, fault tolerance, distributed systems.- Experience with infrastructure and DevOps: Docker, Kubernetes, Terraform, CI/CD pipelines.- Hands-on experience with cloud platforms (AWS, Azure, GCP), especially for AI workloads.- Knowledge of databases (SQL & NoSQL) and caching systems (Redis, Memcached).- Experience integrating LLMs or AI APIs into production systems (OpenAI, HuggingFace, LangChain, etc.).- Familiarity with messaging/streaming systems (Kafka, RabbitMQ).- Monitoring and observability experience (Prometheus, Grafana, ELK).- Strong problem-solving, debugging, and analytical skills.- Excellent communication and collaboration skills.Nice to Have- Experience with generative AI pipelines, vector databases, and embeddings.- Familiarity with ML Ops tools (MLflow, BentoML, Ray Serve, etc.).- Knowledge of event-driven architectures and microservices.- Prior experience in AI/LLM-focused startups or high-scale AI systems.What We Offer- Opportunity to work on challenging, large-scale systems with real-world impact.- Collaborative team culture with focus on learning and innovation.- Competitive compensation and growth opportunities.