Job Overview
Company
BUILDING BLOCKS SOFTWARE SERVICES PRIVATE
Category
Computer Occupations
Ready to Apply?
Take the Next Step in Your Career
Join BUILDING BLOCKS SOFTWARE SERVICES PRIVATE and advance your career in Computer Occupations
Apply for This Position
Click the button above to apply on our website
Job Description
<p><p><b>Description : </b><br/><br/><b>Role Overview : </b><br/><br/></p><p>You will own the data pipeline powering our LLM training and fine-tuning.
This includes ingestion, cleaning, deduplication, and building high-quality datasets from structured/unstructured sources.<br/><br/><b>Responsibilities : </b><br/><br/></p><p>- Design ETL pipelines for text, PDFs, and structured data.<br/><br/></p><p>- Implement data deduplication, filtering (toxicity, PII), and normalization.<br/><br/></p><p>- Train and manage tokenizers (SentencePiece/BPE).<br/><br/></p><p>- Build datasets for supervised fine-tuning and evaluation.<br/><br/></p><p>- Work closely with domain experts to generate instruction/response pairs.<br/><br/><p><b>Requirements : </b></p><p><br/></p>- Strong in Python, SQL, and data wrangling frameworks (Pandas, Spark).<br/><br/></p><p>- Experience with large text datasets, cleaning, preprocessing.<br/><br/></p><p>- Familiarity with NLP-specific preprocessing (chunking, embeddings).<br/><br/></p><p>- Knowledge of cloud data storage (S3/GCS/Blob).<br/><br/></p><p>- Bonus : Prior experience in AI/ML pipelines.</p><br/></p> (ref:hirist.tech)
About BUILDING BLOCKS SOFTWARE SERVICES PRIVATE
Don't Miss This Opportunity!
BUILDING BLOCKS SOFTWARE SERVICES PRIVATE is actively hiring for this Data Engineer - Python/SQL/ETL position
Apply Now