As you prepare your application for the Cohere Labs Scholars Program, you will see one of the key questions is to prepare a video submission.

<aside> 📹

Video Interview Prompt

Choose a paper from the list below and suggest directions in which the work could be further expanded or explored.

</aside>

Video Interview Instructions

Record your response as a video. Your video should be a maximum of 5 minutes long.
Feel free to use slides or other presentation tools if they will help you convey your thoughts.
Include the link to your video on your application. Please ensure that the sharing settings allow us to see your video.

Tips for a Great Video

Eligible Papers to Select for your Video Response

| Paper Title | Category | Abstract | | --- | --- | --- | | NeoBabel: A Multilingual Open Tower for Visual Generation | multimodal | We introduce NeoBabel, a multilingual open-source model for visual generation. NeoBabel is trained on a large-scale dataset of images and text in multiple languages, enabling it to generate high-quality images from text prompts in various languages. | | When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs | inference? | This paper explores the impact of scaling up inference compute on the performance of multilingual large language models (LLMs). We find that increasing the number of samples during inference can significantly improve the model's ability to handle low-resource languages and complex linguistic tasks. | | Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers | inference | We propose a novel approach for real-time targeting of long-tail entities during training. By using training-time markers, our method efficiently identifies and focuses on rare entities, improving the model's ability to handle diverse and less frequent data. | | One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers | multilingual | This work investigates the emergence of language plasticity in multilingual tokenizers. We demonstrate that a single tokenizer can adapt to multiple languages, improving cross-lingual transfer and reducing the need for language-specific tokenization. | | The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It | safety | We review the current state of multilingual LLM safety research, highlighting the language gap in safety evaluations. We propose methods to measure and mitigate this gap, ensuring safer and more equitable multilingual models. | | How to Improve the Robustness of Closed-Source Models on NLI | data? | This paper presents techniques to enhance the robustness of closed-source models on natural language inference (NLI) tasks. We focus on improving generalization and reducing bias in these models. | | Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects | policy | We argue for a new evaluation ecosystem to assess AI's real-world impact. Current benchmarks often fail to capture practical performance, and we propose a framework for more realistic and comprehensive evaluations. | | Aya Vision: Advancing the Frontier of Multilingual Multimodality | multimodal | Aya Vision is a new multimodal model designed for multilingual tasks. It combines vision and language understanding, achieving state-of-the-art performance in cross-lingual image captioning and visual question answering. | | Crosslingual Reasoning through Test-Time Scaling | inference | We introduce a test-time scaling approach for crosslingual reasoning. By adapting the model's parameters during inference, we improve its ability to handle diverse languages and complex reasoning tasks. | | The Leaderboard Illusion | evaluation | This paper critiques the current state of AI leaderboards, arguing that they often fail to reflect real-world performance. We propose alternative evaluation methods to address these limitations. | | Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation | multilingual | We evaluate multilingual LLMs using machine translation metrics, providing a new perspective on their performance. This approach highlights strengths and weaknesses in cross-lingual understanding. | | Kaleidoscope: Exams for Multilingual Vision Evaluation | evaluation | Kaleidoscope is a new benchmark for evaluating multilingual vision models. It includes diverse tasks and languages, providing a comprehensive assessment of cross-lingual visual understanding. | | When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning | preference | We analyze personalized preference learning in real-world scenarios. Our study explores the challenges and opportunities in tailoring models to individual user preferences. | | From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions | evaluation | This work evaluates LLMs in multi-session coding interactions, assessing their ability to collaborate and maintain context across sessions. We propose new metrics for evaluating such interactions. | | Bridging the Data Provenance Gap Across Text, Speech, and Video | policy | We address the data provenance gap in multimodal learning, proposing methods to trace and verify the origins of text, speech, and video data. This ensures transparency and accountability in AI systems. | | If You Can't Use Them, Recycle Them | merging | This paper explores the recycling of pre-trained models for new tasks. We demonstrate that repurposing existing models can be an efficient and effective strategy for various applications. | | [Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier](https://arxiv.org/pdf/2412.04261?) | multilingual | Aya Expanse integrates recent research advancements to create a new multilingual model. It achieves superior performance in cross-lingual tasks, setting a new benchmark for multilingual AI. | | Global MMLU | evaluation | Global MMLU is a new benchmark for evaluating multilingual models across diverse languages and tasks. It provides a comprehensive assessment of cross-lingual understanding and reasoning. | | INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge | evaluation | INCLUDE is a benchmark for evaluating multilingual models' ability to incorporate regional knowledge. It assesses how well models understand cultural and geographical nuances in different languages. | | M-RewardBench: Evaluating Reward Models in Multilingual Settings | multilingual | M-RewardBench is a new framework for evaluating reward models in multilingual contexts. It focuses on fairness and effectiveness across languages, ensuring equitable performance. | | Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | merging | We compare mixing data versus merging models for multi-task learning. Our analysis provides insights into the optimal strategies for handling diverse tasks and datasets. | | Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | data | This paper introduces a diversity-centric data selection method with iterative refinement. It improves model performance by focusing on diverse and representative data samples. | | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | efficiency | We propose a simple and efficient method for parameter upcycling in mixture of experts models. This approach reduces computational costs while maintaining high performance. | | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | efficiency | Nexus combines specialization and adaptability in training mixture of experts models. It achieves efficient training and improved performance across diverse tasks. | | MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions | multilingual | MURI creates high-quality instruction tuning datasets for low-resource languages using reverse instructions. This approach enhances model performance in under-represented languages. | | Investigating Continual Pretraining in Large Language Models: Insights and Implications | continual training | We investigate continual pretraining in large language models, exploring its impact on performance and generalization. Our findings provide insights into effective pretraining strategies. | | A Post-trainer's Guide to Multilingual Training Data: Uncovering Cross-lingual Transfer Dynamics | multilingual | This guide analyzes multilingual training data, uncovering dynamics of cross-lingual transfer. It provides practical insights for improving multilingual model training. |