These are two foundational questions that touch on the practical engineering and mathematical theory of Deep Learning.
Here is the breakdown of why we do things this way.
You are correct that it is "easier," but the main reason is actually survival against the computer's limits.
Probabilities are always between 0 and 1. When you train a model on a sequence of text, you are essentially calculating the probability of the entire sequence happening. To do this, you multiply the probabilities of each word together.
$$P(\text{sequence}) = p_1 \times p_2 \times p_3 \times \dots \times p_{100}$$
Computers have a limit on how small a number they can store (floating-point precision). If you multiply tiny numbers together, they vanish.
The Log Fix:
Logarithms map these tiny numbers to manageable negative numbers.
230.25 easily. It cannot handle 0.000...001.