It is a techniques which converts words into vectors. It can be learn by this two ways → Count or Frequency ={ BOW, TF-IDF, OHE }
→ Deep learning trained model = {Word2vec → CBOW , Embedding layers }
→ Skip Grams
It solves the problems of Bow and TF-IDF the problems is loss of semantic meaning and sparse matrix is very huge and fill of 1 and 0 so word2vec solve this problems . because it has → limited dimension → fixed size dense representation → sparsity is reduced → vectors are dense not full of zeros → semantic meaning is maintain → words with similar meaning have similar vector representations.
It have many features like ( gender, age, food, color, ….) and put a value for each word to represent it
It Predicts the target word from surrounding context words.
⇒ Context Widows → is the number of surrounding words used to predict a target word. Ex: Krish channel is related to data science context window size =5
Independent Feature | Output |
---|---|
Krish, Channel, related, to | is |
Channel, is, to, data | related |
is, related, data, science | to |
Apply Neural network it make vectors for words as inputs (75) and hidden layer (55) and the output (7*1) to predict the word.
Input Layer → One-hot vectors for each context word (shape: (V x 1) for each word).
Embedding Projection → Multiply each one hot vector by the embedding matrix (V x N ) to get an N dimensional dense vector.
Averaging/Summing → Combine all context word embeddings into a single vector shape (N x 1)
:krish, channelفيه كلمتين context يعني لو عندي
embedding("Krish") = [0.2, 0.5, 0.1]
embedding("Channel") = [0.4, 0.3, 0.9]
⇒ Summing → [0.2+0.4, 0.5+0.3, 0.1+0.9] = [0.6, 0.8, 1.0]