The short answer is: Mathematically, it is arbitrary. But physically and psychologically, "Descent" is much more intuitive.

There is no "deep mathematical truth" that makes Descent better than Ascent. If you multiplied every loss function in PyTorch by -1 and switched to Gradient Ascent, the AI would learn exactly the same way.

However, we chose Descent (Minimization) for three specific reasons rooted in Physics, Engineering history, and Human psychology.

1. The Physics Analogy: "Energy Landscapes"

This is the "deepest" reason. Deep Learning borrows heavily from physics (specifically statistical mechanics).1

In the physical world, nature always tries to find the state of lowest energy.

When we visualize training an AI, we imagine the model as a "ball" navigating a landscape of possible errors. It is intuitive to imagine "gravity" pulling the model down into the valley of the correct answer.

If we used Gradient Ascent, we would have to imagine the model trying to climb a mountain to the peak. While possible, it breaks the gravity analogy that helps researchers visualize "momentum" and "friction" (which are real terms we use in optimizers like SGD and Adam).

2. The "Cost" Mental Model

Machine Learning has roots in classical engineering and optimization. In these fields, we frame problems in terms of Cost or Error.

"Zero Error" is a very hard, solid "floor" to aim for. It feels stable.

In contrast, "Maximum Utility" or "Maximum Fitness" can feel abstract and unbounded. It is psychologically satisfying to say, "The error is 0," rather than "The fitness is -0.0" (which is what Log Likelihood maximizes to).