Job Overview
Category
Computer Occupations
Ready to Apply?
Take the Next Step in Your Career
Join Unloadbox and advance your career in Computer Occupations
Apply for This Position
Click the button above to apply on our website
Job Description
<p><p><b>Description :</b></p><br/><p><b>About the Role :</b><br/><br/>We are seeking a GenAI Data Engineer to design, build, and optimize data pipelines for unstructured and semi-structured content, integrating advanced AI/ML capabilities.
This role combines modern ETL expertise with Vector Database & GenAI integration to support intelligent document processing and semantic search applications.</p><br/><p><b>Key Responsibilities :</b><br/><br/>- Develop and maintain data ingestion pipelines using Azure Data Factory (ADF) and Databricks for structured and unstructured data.<br/><br/>- Create notebooks to process PDF and Word documents, including extracting text, tables, charts, graphs, and images.<br/><br/>- Apply NLP / Embedding Models (e.g., OpenAI, Hugging Face, sentence-transformers) to convert extracted content into embeddings.<br/><br/>- Store embeddings and metadata into Vector Databases (e.g., FAISS, Pinecone, Milvus, Weaviate, ChromaDB).<br/><br/>- Design and implement semantic search and retrieval workflows to enable prompt-based query capabilities.<br/><br/>- Optimize ETL pipelines for scalability, reliability, and performance.<br/><br/>- Collaborate with data scientists and solution architects to integrate GenAI capabilities into enterprise applications.<br/><br/>- Follow best practices for code quality, modularity, and documentation.</p><br/><p><b>Required Skills & Experience :</b><br/><br/>- Proven experience in Azure Data Factory (ADF) and Databricks for building ETL/ELT workflows.<br/><br/>- Strong programming experience in Python (pandas, PySpark, PyPDF, python-docx, OCR libraries, etc.).<br/><br/>- Hands-on experience with Vector Databases and semantic search implementation.<br/><br/>- Understanding of embedding models, LLM-based retrieval, and prompt engineering.<br/><br/>- Familiarity with handling multi-modal data (text, tables, images, charts).<br/><br/>- Strong knowledge of data modeling, indexing, and query optimization.<br/><br/>- Experience with cloud platforms (Azure preferred).<br/><br/>- Strong problem-solving, debugging, and communication skills.</p><br/><p><b>Nice to Have :</b><br/><br/>- Experience with knowledge graphs or RAG (Retrieval-Augmented Generation) pipelines.<br/><br/>- Exposure to MLOps practices and LLM fine-tuning.<br/><br/>- Familiarity with enterprise-scale document management systems</p><br/></p> (ref:hirist.tech)
Don't Miss This Opportunity!
Unloadbox is actively hiring for this Unloadbox - Data Engineer - ETL/Generative AI position
Apply Now