Who you are
You're someone who’s already shipped GenAI stuff—even if it was small: a chatbot, a RAG tool, or an agent prototype.
You live in Python, LangChain, LlamaIndex, Hugging Face, and vector DBs like FAISS or Milvus.
You know your way around prompts—noisy chains, rerankers, retrievals.
You've deployed models or services on Azure/AWS/GCP, wrapped them into FastAPI endpoints, and maybe even wired a bit of terraform/ARM.
You’re not building from spreadsheets; you're iterating with real data, debugging hallucinations, and swapping out embeddings in production.
You can read blog posts and paper intros, follow new methods like QLoRA, and build on them.
You're fine with ambiguity and startup chaos—no strict specs, no roadmap, just a mission.
You work in async Slack, ask quick questions, push code that works, and help teammates stay afloat.
You're not satisfied with just getting things done—you want GenAI to feel reliable, usable, and maybe even fun.
What you’ll actually do
You’ll build real GenAI features: agentic chatbots for document lookup, conversation assistants, or knowledge workflows.
You’ll design and implement RAG systems: data ingestion, embeddings, vector indexing, retrievals, and prompt pipelines.
You’ll write inference APIs in FastAPI that work with vector stores and cloud LLM endpoints.
You’ll containerize services with Docker, push to Azure/AWS/GCP, wire basic CI/CD, monitor latency and faulty responses, and iterate fast.
You’ll experiment with LoRA/QLoRA fine-tuning on small LLMs, test prompt variants, and measure output quality.
You’ll collaborate with DevOps to ensure deployment reliability, QA to make tests more robust, and frontend folks to shape UX.
You’ll share your work in quick “demo & dish” sessions: what's working, what's broken, what you're trying next.
You’ll tweak embeddings, watch logs, and improve pipelines one experiment at a time.
You’ll help write internal docs or “how-tos” so others can reuse your work.
Skills and knowledge
You have solid experience in Python backend development (FastAPI/Django)
Experienced with LLM frameworks: LangChain, LlamaIndex, CrewAI, or similar
Comfortable with vector databases: FAISS, Pinecone, Milvus
Able to fine-tune models using PEFT/LoRA/QLoRA
Knowledge of embeddings, retrieval systems, RAG pipelines, and prompt engineering
Familiar with cloud deployment and infra-as-code (Azure, AWS, GCP with Docker/K8s, Terraform/ARM)
Good understanding of monitoring and observability—tracking response latency, hallucinations, and costs
Able to read current research, try prototypes, and apply them pragmatically