
Senior/Middle Data Scientist (Benchmarking & Alignment)
- Украина
- Постоянная работа
- Полная занятость
- Analyze benchmarking datasets, define gaps and design, implement, and maintain comprehensive benchmarking framework for Ukrainian language.
- Research and integrate state-of-the-art evaluation metrics for factual accuracy, reasoning, language fluency, safety, and alignment.
- Design and maintain testing frameworks to detect hallucinations, biases, and other failure modes in LLM outputs.
- Develop pipelines for synthetic data generation and adversarial example creation to challenge the model's robustness.
- Collaborate with human annotators, linguists, and domain experts to define evaluation tasks and collect high-quality feedback.
- Develop tools and processes for continuous evaluation during model pre-training, fine-tuning, and deployment.
- Research and develop best practices and novel techniques in LLM training pipelines.
- Analyze benchmarking results to identify model strengths, weaknesses, and improvement opportunities.
- Work closely with other data scientists to align training and evaluation pipelines.
- Document methodologies and share insights with internal teams.
- 3+ years of experience in Data Science or Machine Learning, preferably with a focus on NLP.
- Proven experience in machine learning model evaluation and/or NLP benchmarking.
- Advanced degree (Master's or PhD) in Computer Science, Computational Linguistics, •Machine Learning or a related field is highly preferred.
- Good knowledge of natural language processing techniques and algorithms.
- Hands-on experience with modern NLP approaches including embedding models, sematic search, text classification, sequence tagging (NER), transformers/LLMs, RAGs.
- Familiarity with LLM training and fine-tuning techniques.
- Proficiency in Python and common data science and NLP libraries (pandas, NumPy, scikit-learn, spaCy, NLTK, langdetect, fasttext).
- Strong experience with deep learning frameworks such as PyTorch or TensorFlow for building NLP models.
- Solid understanding of RLHF concepts and related techniques (preference modeling, reward modeling, reinforcement learning).
- Ability to write efficient, clean code and debug complex model issues.
- Solid understanding of data analytics and statistics.
- Experience creating and managing test datasets, including annotation and labeling processes.
- Experience in experimental design, A/B testing, and statistical hypothesis testing to evaluate model performance.
- Comfortable working with large datasets, writing complex SQL queries, and using data visualization to inform decisions.
- Experience deploying machine learning models in production (e.g., using REST APIs or batch pipelines) and integrating with real-world applications.
- Familiarity with MLOps concepts and tools (version control for models/data, CI/CD for ML).
- Experience with cloud platforms (AWS, GCP or Azure) and big data technologies (Spark, Hadoop, Ray, Dask) for scaling data processing or model training is a plus.
- Experience working in a collaborative, cross-functional environment.
- Strong communication skills to convey complex ML results to non-technical stakeholders and to document methodologies clearly.
- Prior work on LLM safety, fairness, and bias mitigation.
- Familiarity with evaluation metrics for language models (perplexity, BLEU, ROUGE, etc.) and with techniques for model optimization (quantization, knowledge distillation) to improve efficiency.
- Knowledge of data annotation workflows and human feedback collection methods.
- Publications in NLP/ML conferences or contributions to open-source NLP projects.
- Active participation in the AI community or demonstrated continuous learning (e.g., Kaggle competitions, research collaborations) indicating a passion for staying at the forefront of the field.
- Familiarity with the Ukrainian language and context.
- Understanding of cultural and linguistic nuances that could inform model training and evaluation in a Ukrainian context.
- Knowledge of Ukrainian benchmarks, or familiarity with other evaluation datasets and leaderboards for large models can be an advantage given our project's focus.
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes) for ML, as well as ML workflow tools (MLflow, Airflow).
- Experience in working alongside MLOps engineers to streamline the deployment and monitoring of NLP models.
- Innovative mindset with the ability to approach open-ended AI problems creatively.
- Comfort in a fast-paced R&D environment where you can adapt to new challenges, propose solutions, and drive them to implementation.
- Office or remote - it's up to you. You can work from anywhere, and we will arrange your workplace.
- Remote onboarding.
- Performance bonuses for everyone (annual or quarterly - depends on the role).
- We train employees: with the opportunity to learn through the company's library, internal resources, and programs from partners.
- Health and life insurance.
- Wellbeing program and corporate psychologist.
- Reimbursement of expenses for Kyivstar mobile communication.