Setup

Currently we use ChatGPT to generate training data with two different prompts:

One is very simple and another one with all possible forms of the word, for example:

  1. “Please provide 30 sentences in German using word grundsätzlich”

  2. “Please provide 30 sentences in German using words: grundsätzlich, grundsätzliche, grundsätzlicher, grundsätzlichem, grundsätzlichen in different context and in different place inside of the sentence”

All Data (labeled, not labeled) can be found here:

English: https://drive.google.com/drive/folders/1KkM-lvtcGZn2p_RQSaCixNTi-aEWPbec?usp=sharing

German: https://drive.google.com/drive/folders/1T8qwsKcfyjY-8uOJ4U7uEHX3R7BvMOhL?usp=sharing

Expert need to label data for the classifier (Tracey)

Current labeled data for context aware False positives in ML model:

English sentences:

list of words: (dynamic, impact, fossil, flexible, best, retard, brilliant, retarded, alone):

German

(unabhängig, entschieden)

Tracking of data labelling English and German:

****https://www.notion.so/witty-works/Data-flagging-for-Elena-2nd-group-green-highlight-cc492c702c574521902ba394176585ef?pvs=4