Video-4 | Notion

Word Embeddings:

It is a techniques which converts words into vectors. It can be learn by this two ways → Count or Frequency ={ BOW, TF-IDF, OHE }
→ Deep learning trained model = {Word2vec → CBOW , Embedding layers } → Skip Grams

Word2Vec:

It solves the problems of Bow and TF-IDF the problems is loss of semantic meaning and sparse matrix is very huge and fill of 1 and 0 so word2vec solve this problems . because it has → limited dimension → fixed size dense representation → sparsity is reduced → vectors are dense not full of zeros → semantic meaning is maintain → words with similar meaning have similar vector representations.

It have many features like ( gender, age, food, color, ….) and put a value for each word to represent it

CBOW ( Continuous Bag Of Words):

It Predicts the target word from surrounding context words.

⇒ Context Widows → is the number of surrounding words used to predict a target word. Ex: Krish channel is related to data science context window size =5

Independent Feature	Output
Krish, Channel, related, to	is
Channel, is, to, data	related
is, related, data, science	to

Independent Feature → Words surrounding the target word (context)
Output (Target Word) → The word we are trying to predict based on the context

Apply Neural network it make vectors for words as inputs (75) and hidden layer (55) and the output (7*1) to predict the word.

Input Layer → One-hot vectors for each context word (shape: (V x 1) for each word).
Embedding Projection → Multiply each one hot vector by the embedding matrix (V x N ) to get an N dimensional dense vector.
Averaging/Summing → Combine all context word embeddings into a single vector shape (N x 1)

:krish, channelفيه كلمتين context يعني لو عندي
embedding("Krish") = [0.2, 0.5, 0.1]
embedding("Channel") = [0.4, 0.3, 0.9]

⇒ Summing → [0.2+0.4, 0.5+0.3, 0.1+0.9] = [0.6, 0.8, 1.0]