The short answer is: Mathematically, it is arbitrary. But physically and psychologically, "Descent" is much more intuitive.
There is no "deep mathematical truth" that makes Descent better than Ascent. If you multiplied every loss function in PyTorch by -1 and switched to Gradient Ascent, the AI would learn exactly the same way.
However, we chose Descent (Minimization) for three specific reasons rooted in Physics, Engineering history, and Human psychology.
This is the "deepest" reason. Deep Learning borrows heavily from physics (specifically statistical mechanics).
If we used Gradient Ascent, we would have to imagine the model trying to climb a mountain to the peak. While possible, it breaks the gravity analogy that helps researchers visualize "momentum" and "friction" (which are real terms we use in optimizers like SGD and Adam).
Machine Learning has roots in classical engineering and optimization. In these fields, we frame problems in terms of Cost or Error.
"Zero Error" is a very hard, solid "floor" to aim for. It feels stable.
In contrast, "Maximum Utility" or "Maximum Fitness" can feel abstract and unbounded. It is psychologically satisfying to say, "The error is 0," rather than "The fitness is -0.0" (which is what Log Likelihood maximizes to).
Much of the early theory of optimization relied on Convex Functions (shaped like a bowl).