Before doing any front-end php, lets tackle the hard part first. the backend logic. Once I get a working protype with mock data, I’ll get a UI running.

Main idea: compare semantically [semantic similarity] student response , with professors answer key.

I can see that the example asks for 2 features. this may be an issue for semantic similarity. because I think its more accurate to compare 1:1. meaning break down the response to 2. and perform semantic similarity. individually. now to break down and match it to the parallel professor answer, is a good fitting task for LLM.

I think a more general solution is required, judging by the nature of free text professor-question box and rubric.

So LLM would do more heavy lifting.

….

RAG vs Semantic similarity.

for comparing student answers to professors answers, semantic similarity would be more fitting. it allows to get a objective quantitive response, in contrast to asking LLM to be the judge which could be subjective as LLM’s are often nicer.

RAG is useful more when looking up information and providing more value on top of it. which is not needed for grading, and could be an over kill for feedback at this point development.

Semantic Similarity

I need to get a model that produces hebrew embeddings.

asked every LLM for suggestion and then compare all results across every LLM