We are building next-generation AI Voice Agents for recruiters .
You’ll design and implement real-time, low-latency conversational systems that combine LLMs, STT, TTS, RAG , and streaming pipelines.
The backend runs in Python , with a React/Node.js frontend.
If you have built real-time voice systems that can handle human interruptions and dynamic context, this role fits you.
Key Responsibilities
- Build and optimize real-time AI Voice Agents in Python.
- Integrate TTS (e.g., ElevenLabs) and STT (e.g., Deepgram) systems for natural, fast voice interaction.
- Implement LLM integration with focus on latency reduction and interruption handling .
- Develop caching layers for prompts and audio streams to reduce cost and delay.
- Design and maintain SIP, WebRTC, and WebSocket pipelines for call handling.
- Handle automatic answering machine detection (AMD) and dynamic call routing.
- Work with React/Node.js frontend teams for end-to-end real-time communication.
Required Skills
- Python : Strong backend development experience.
- Real-Time AI Voice Systems : Proven working experience is mandatory .
- LLM Integration : Deep understanding of latency, interruption handling, and RAG pipelines.
- Networking Protocols : Hands-on experience with SIP, WebRTC, and WebSockets .
- Frontend Collaboration : Working knowledge of React and Node.js .
Preferred Skills
- Experience with ElevenLabs , Deepgram , or other leading TTS/STT APIs.
- Knowledge of audio stream caching , buffering, and GPU optimization.
- Familiarity with FreeSWITCH , Asterisk , or similar VoIP platforms.
- Background in automatic machine detection and call classification.
Why Join Us
You’ll work on cutting-edge voice-first AI infrastructure used by staffing companies worldwide.
Early-stage team, fast decisions, deep tech stack, and clear impact.
Salary is in INR