TASK: Given m documents, compute the term-term relevance - Input: A text file, each line represents a document - Output: A list of term-term pairs sorted by their similarity descending
Huong Pham, Computer Science - Data Analyst
Jakub Czachor, Computer Science - Data Engineer
Stacey Li, Computer Science - Data Engineer
The formula for TF IDF is the Term Frequency (Number of occurrences of a word) multiplied by the log of the total number of documents over the number of documents containing the word.