GitHub - OnedayOneyeah/factcheck-ko-2023

Author

👩‍🎓 한예원 / Yewon(Chloe) Han

🎓 서울대학교 언론정보학과 / Department of Communication at Seoul National University

✉️ sign1028@snu.ac.kr

Abstract

Fact-checking has been a social practice of making evidence-based analyses of the accuracy of a text which might have huge impact on the society. As the volume of misinformation has surged whilst its form has varied, fact-checking automation is suggested as a potential solution to meet the demands. This study aims to develop an automated Korean fact-checking model using factcheck-ko datasets. Three hypotheses are formulated, and experiments are conducted to test them. The results show that the KNN Sentence Selection model has clear limitation, as it returns neither extensive nor exclusive sentences, rejecting the first hypothesis. The regression approach falls short of the multi-classification Recognition Textual Entailment (RTE) model trained with Multiple Premises Entailment (MPE) datasets, but has valid advantages in terms of robustness and recall rates by class. The MPE approach is found to be valid and scalable, scoring the highest test accuracy over the baseline. In the Discussion chapter, we raise questions about the validity of NotEnoughInfo as a separate class and highlight the need for further discussion on meters and datasets adjusted to real-world scenarios. The potential for end-to-end fact-checking models is also reviewed.

Keywords

fact-checking automation, natural language inference, FEVER, information verification

Introduction

  1. Background

    Fact-checking is a practice of making evidence-based analyses of the accuracy of a political claim, news report, or other public text, intended to prevent the public from rumors or false beliefs as well as induce politicians to be more responsible for their statements. (L.Graves, 2019) This process usually tends to be time-consuming for human fact-checkers in that it includes profound interpretation of the context behind the scene, and sound verification (or confrontation) based on a variety of credible sources.

    As the volume of misinformation has surged whilst its form has varied for recent years, the demand for fact-checking has increased accordingly. However, due to its limitation of the capacity, present manual fact-checking system sometimes fails to keep up with the high demand. Fact-checking automation is suggested in this context, as a potential tool to speed up the fact-checking process and furthermore enabling real-time verification. Hence, a number of fact-checking platforms as well as BigTechs already started to invest in developing automated fact-checking systems. For instance, Duke Reporters’ Lab whose lead is the creator of PolitiFact, developed the live fact-checking program Squash, which detects politicians statements in a video format and matches them to existing fact-checking results. Google is also funding the research on the role of ML and AI in fact-checking system, led by Kate Wilkinson, the senior product manager at UK-based Full Fact, and Pablo Fernández, executive director of the Argentine-based Chequead.

  2. Motivation

    This paper presents the process of solving fact-checking task, a branch of information verification, which has been studied since 2018 when Thorne introduced the FEVER task competition. The goal of this research is to develop an automated Korean fact-checking model, based on factcheck-ko datasets published by Seoul National University.

    To enhance the performance of the model over baseline, we formulate three hypotheses and tested their validity through experiments. A new Sentence Selection (SS) model is designed based on inner product approach without training and tested if it outperforms the baseline. The Recognition Textual Entailment (RTE) model is rebuilt based on pre-trained regression model or the datasets constructed by using MPE formation.

    Ultimately, we expect to make a contribution on combating misinformation and fake news more effectively.

Related Works

  1. The Fact Extraction and VERification (FEVER) Shared Task

    The fact-checking task is essentially corresponding to FEVER task. (Thorne et al., 2018)It is to classify a human-written factoid into three different classes: Supported, Refuted or NotEnoughInfo, and the best performance was achieved by UNC-NLP with 64.21% FEVER score. Since FEVER task was hosted as a competition, the fundamental structure of submitted models basically follow that of FEVER baseline model, consisting of three components: Document Retrieval (DR), Sentence Selection (SS), and Natural Language Inference (NLI). The datasets for this task are constituted of 185,445 human-generated claims, manually verified against the introductory sections of Wikipedia pages and labeled afterwards. They are regarded as a benchmark for information verification task.

    The SOTA model for the FEVER task by UNC-NLP consists of three interconnected components: a document retriever, a sentence selector, and a claim verifier. The document retriever selects candidate wiki-documents based on keyword matching and page view frequency statistics. The sentence selector uses a sequence-matching neural network to select evidential sentences by comparing claims with all sentences in the candidate documents. The claim verifier is a 3-way neural NLI classifier that concatenate all selected evidence as the premise and the claim as the hypothesis, and labels each as ‘support’, ‘refute’, or ‘not enough info’. For further improvement, sentence selector and NLI classifier were integrated as end-to-end by feeding the sentence similarity score as an additional token-level feature to the claim verifier so that the model can learn the contextual importance of each evidence as weights.

  2. Implementing fact-checking automation on real-world scenario

    Mykola TROKHYMOVYCH reproduced the results of SOTA solutions in his paper and lay some groundworks for an end-to-end open source tool to perform automated fact-checking. (Mykola TROKHYMOVYCH, 2021)The main focus of the study was on speeding up the verification process in accordance to real-world scenario. Hence, the model was constructed into two parts: Wikipedia Search API and Natural Language Inference based on Siamese network using a BERT-like model as a trainable encoder for sentences.

    Miguel Arana-Catania proposed new techniques for developing a veracity assessment model using graph convolutional neural networks and attention-based approaches in response to the widespread misinformation surrounding the COVID-19 pandemic. (Miguel Arana-Catania et al., 2021) They trained and tested their model using PANACEA, a comprehensive COVID-19 fact-checking dataset, which contained two subsets: PANACEA Large with 5,143 claims and PANACEA Small with 1,709 claims. This domain specific fact-checking research is differentiated from classic FEVER task researches in that it does not have an independent class for NotEnoughInformation, but the model still uses the same three-step process of Document Retrieval, Sentence Retrieval, and Veracity Assessment as the FEVER baseline model.

    Domain adaptation research is a significant area of interest for implementing fact-checking programs in real-world scenarios. Since the FEVER dataset is based on English, there is a growing need for constructing datasets for different languages. To address this need, Kim (2021) published Korean datasets consisting of approximately 80k claims and paired evidences based on the classic FEVER task framework. These are human-annotated claims and evidences based on Korean Wikipedia and news datasets provided by Yonhap News, used for annotating evidences for each claim. This effort was made possible through a Memorandum of Understanding (MOU) between Yonhap News, Seoul National University, and the Community Media Foundation. The datasets and baseline model are available on Github https://github.com/hongcheki/factcheck-ko-2021, and the structure follows the FEVER baseline model presented by Thorne (2018), consisting of document retrieval (DR), sentence selection (SS), and recognition textual entailment (RTE).

  3. End-to-end fact-checking model

    The end-to-end design for fact-checking models is a novel approach which provides insights about redefining the task. The multi-task learning with bidirection attention (EMBA) model is one example of an end-to-end fact-checking model. (Li et al., 2018) By processing both sentence extraction and claim verification tasks simultaneously, it aims to provide a more streamlined and efficient approach to fact-checking. Although its performance on the FEVER task may not be as high as UNC-NLP state-of-the-art model considering its FEVER score 29.22, the approach offers a new perspective on how to approach fact-checking as an end-to-end process.