π Summary: This paper introduces CV-18 NER, the first publicly available dataset for extracting named entities directly from Arabic speech, created by annotating the Common Voice 18 corpus with 21 entity types following the Wojood schema. End-to-end neural models substantially outperform traditional pipelines (ASR + text NER), reaching 37.0% character error rate on the test set. This work addresses a critical gap in Arabic speech processing where morphological complexity and limited resources have hindered progress.
π‘ Key Insight: For Arabic speech, extracting entities directly from audio beats the traditional approach of first converting speech to text then finding entities.
π Read Paper
π Summary: This study compares radiologist evaluations with LLM-as-a-judge assessments on the quality of machine-translated Japanese medical reports from English chest CT scans. Using blinded pairwise comparisons across four criteria (terminology accuracy, readability, quality, authenticity), the research investigates whether LLMs can reliably evaluate translation quality in clinical settings. The findings help establish the validity of using LLMs to assess medical document translation rather than requiring expert human review.
π‘ Key Insight: AI judges can sometimes accurately evaluate whether other AIs have correctly translated medical documents, potentially reducing reliance on expert human reviewers.
π Read Paper
π Summary: This paper addresses limitations in LLM-based talent recruitment systems that use a pointwise approach (evaluating candidates individually), resulting in position bias and the "lost-in-the-middle" problem where middle candidates are undervalued. The authors propose a ranking-based paradigm that implicitly captures candidate relationships, reducing token consumption while improving recommendation quality. This approach makes LLM-based hiring systems more efficient and less biased toward candidate position.
π‘ Key Insight: Asking an LLM to rank job candidates together, rather than evaluating them one-by-one, produces fairer and more efficient hiring recommendations.
π Read Paper
π Summary: This paper introduces Neuro-RIT, a framework that improves how language models handle noisy or irrelevant retrieved information by identifying and updating only specific neurons responsible for processing good versus bad context. Unlike previous coarse-grained approaches that update entire layers, this method uses attribution-based neuron mining to precisely align sparse neurons, making RALMs more robust when retrieval sources are imperfect.
π‘ Key Insight: By tuning individual neurons rather than whole layers, language models can be trained to better ignore bad information from search results.