Job Description
<p><p><b>About Norstella :</b><br/><br/>At Norstella, our mission is simple : to help our clients bring life-saving therapies to market quickerand help patients in need.<br/><br/>Founded in 2022, but with history going back to 1939, Norstella unites best-in-class brands to help clients navigate the complexities at each step of the drug development life cycle and get the right treatments to the right patients at the right time.<br/><br/>Each Organization (Citeline, Evaluate, MMIT, Panalgo, The Dedham Group) Delivers Must-have Answers For Critical Strategic And Commercial Decision-making.<br/><br/>Together, Via Our Market-leading Brands, We Help Our Clients.<br/><br/>Citeline : accelerate the drug development cycle.<br/><br/>Evaluate bring the right drugs to market.<br/><br/>MMIT identify barrier to patient access.<br/><br/>Panalgo turn data into insight faster.<br/><br/>The Dedham Group think strategically for specialty therapeutics.<br/><br/>By combining the efforts of each organization under Norstella, we can offer an even wider breadth of expertise, cutting-edge data solutions and expert advisory services alongside advanced technologies such as real-world data, machine learning and predictive analytics.<br/><br/>As one of the largest global pharma intelligence solution providers, Norstella has a footprint across the globe with teams of experts delivering world class solutions in the USA, UK, The Netherlands, Japan, China and India.<br/><br/><b>The Role : NLP Data Scientist, AI & Life Sciences :</b><br/><br/>We are seeking a skilled NLP Data Scientist with a focus on cutting-edge Language Models to join our AI & Life Sciences Solutions team.<br/><br/>Your expertise in processing and understanding natural language, paired with your experience in Electronic Health Records (EHR) and clinical data analysis, will be crucial in driving our data science initiatives.<br/><br/>You will be instrumental in developing rich, multimodal real-world datasets that will accelerate RWD-driven drug development within the pharmaceutical industry.<br/><br/><b>Responsibilities :</b><br/><br/>- Lead the application of advanced NLP and Large Language Models (LLMs), including state-of-the-art open-source models (e.g., Llama3, Mixtral, Gemma) and other foundational models, to extract and interpret complex, unstructured medical data from diverse sources such as EHRs, clinical notes, and laboratory reports.<br/><br/></p><p>- Architect and deploy innovative and scalable NLP solutions that leverage the latest in deep learning to solve complex healthcare challenges, working closely with clinical scientists and data scientists.<br/><br/></p><p>- Design and implement robust data pipelines for cleaning, preprocessing, and validating unstructured data, ensuring the accuracy and reliability of all extracted insights.<br/><br/></p><p>- Develop and optimize prompt engineering strategies for fine-tuning LLMs and enhancing their performance on specialized clinical tasks.<br/><br/></p><p>- Translate complex findings into clear, actionable insights for both technical and non-technical stakeholders, driving data-informed decisions across the organization.<br/><br/><b>Qualifications : </b><br/><br/>- Advanced Degree : Master's or Ph.D. in Computer Science, Data Science, Computational Linguistics, Computational Biology, Physics, or a related analytical field.<br/><br/></p><p>- Clinical Data Expertise : Proven experience (3+ years) in handling and interpreting Electronic Health Records (EHRs) and clinical laboratory data.<br/><br/></p><p>- Advanced NLP & Generative AI : Deep experience (3+ years) with modern NLP techniques like semantic search, knowledge graph construction, and few-shot learning.<br/><br/></p><p>- LLM Proficiency : Practical, hands-on experience (2+ years) with fine-tuning, prompt engineering, and inference optimization for LLMs.<br/><br/></p><p>- Technical Stack : Expert proficiency in Python and SQL, with strong experience using Hugging Face Transformers, PyTorch, and/or TensorFlow.<br/><br/></p><p>- Experience in a cloud environment, specifically AWS, with large-scale data systems.<br/><br/></p><p>- MLOps & Workflow Automation : Familiarity with modern MLOps practices (e.g., Git) and a proven track record of developing automated, scalable workflows.<br/><br/></p><p>- Analytical Prowess : A strong analytical mindset with excellent problem-solving skills and a detail-oriented approach to data.<br/><br/></p><p>- Communication : Exceptional verbal and written communication skills with the ability to articulate complex technical findings to a diverse audience.<br/><br/><b>Preferred Qualifications :</b><br/><br/>- Healthcare Compliance : Experience managing Protected Health Information (PHI) and a working knowledge of healthcare data privacy laws such as HIPAA.<br/><br/></p><p>- Medical Terminologies : Familiarity with standard healthcare codes and terminologies, including ICD-10, CPT, LOINC, and SNOMED CT.<br/><br/></p><p>- Advanced Retrieval Systems : Practical experience with Retrieval-Augmented Generation (RAG) systems and vector databases for managing and querying large volumes of unstructured medical documents.<br/><br/><b>Location :</b> Remote India.<br/></p><br/></p> (ref:hirist.tech)