Context-aware false-positive checker

Setup

Currently we use ChatGPT to generate training data with two different prompts:

One is very simple and another one with all possible forms of the word, for example:

“Please provide 30 sentences in German using word grundsätzlich”
“Please provide 30 sentences in German using words: grundsätzlich, grundsätzliche, grundsätzlicher, grundsätzlichem, grundsätzlichen in different context and in different place inside of the sentence”

All Data (labeled, not labeled) can be found here:

English: https://drive.google.com/drive/folders/1KkM-lvtcGZn2p_RQSaCixNTi-aEWPbec?usp=sharing

German: https://drive.google.com/drive/folders/1T8qwsKcfyjY-8uOJ4U7uEHX3R7BvMOhL?usp=sharing

Expert need to label data for the classifier (Tracey)

Current labeled data for context aware False positives in ML model:

English sentences:

list of words: (dynamic, impact, fossil, flexible, best, retard, brilliant, retarded, alone):

German

(unabhängig, entschieden)