TFIDF_FORMULA.png

TASK: Given m documents, compute the term-term relevance - Input: A text file, each line represents a document - Output: A list of term-term pairs sorted by their similarity descending

Jump Ahead To:

Team Members

Huong Pham, Computer Science - Data Analyst

Jakub Czachor, Computer Science - Data Engineer

Stacey Li, Computer Science - Data Engineer


Introduction

Screen Shot 2021-12-15 at 5.22.42 AM.png


TFIDF_FORMULA.png

The formula for TF IDF is the Term Frequency (Number of occurrences of a word) multiplied by the log of the total number of documents over the number of documents containing the word.

Code Walkthrough

Source Code