Growing up, I always founds the notion of "hits" in baseball to be a curious statistic. A player who hits a deep ball directly to a center fielder will fly out, but a player who hits an equally deep ball 100 feet to the fielder's left will achieve a hit. Since this precise placement appears (largely) beyond the batter's control, it seemed that hits captured a lot of randomness that did not reflect on a batter's ability to make contact with a pitch.

Over time, I became slightly more sympathetic to the hit statistic for two reasons. The first is that if the incidence of hits conditional on balls-in-play is truly random, then variation in hits is an unbiased estimate of variation in contact. Insofar as we only care about relative rankings of batters rather than absolute levels — and given a sufficiently large sample — a hit statistic is then a perfectly adequate measure of a batter's ability to hit a ball. Second, I gradually came to understand that contact with the ball is more likely to result in a hit if the batter hits the ball squarely (i.e., as intended), and so better players will obtain more hits.

Of course, these two explanations are somewhat at odds with one another: the first holds that contact is a sufficient statistic for the ability of the player, while the second says we're better off using hits directly. And while the ability of a player is fundamentally untestable, what we can look at is whether the incidence of hits conditional on contact is indeed random.

Anyway, here's one chart showing the number of hits per contact over time.

Figure 1: Hits-per-Contact Over Time. Each point represents the number of hits in the season divided by the number of contacts.

Figure 1: Hits-per-Contact Over Time. Each point represents the number of hits in the season divided by the number of contacts.

where hits per contact is defined as

$$ HPC = \frac{H}{PA-BB-HBP-SO} $$

Somewhat surprisingly, it's trended up, fairly steadily, suggesting thats hits are not as random as I might have expected. Players have become more efficient over time at converting contact with a ball into a base hit.

One can tell a number of stories for what's going on here. One account, suggested by a friend, is that exit velocities have increased. If exit velocities are higher, balls-in-play are harder to field, harder to track down, or more likely to leave the park — all of which translate into higher hits per contact. Though I don't have data on exit velocities directly, the average baseball player has gotten taller and heavier over time with a presumably commensurate increase in swing velocity.

Figure 2: Player Size in the MLB. Each point represents the average height across players whose debut occurred in the given year.

Figure 2: Player Size in the MLB. Each point represents the average height across players whose debut occurred in the given year.

We can also look directly at the increasing frequency of home runs and doubles, suggesting that batters are hitting more powerfully (triples are more about baserunning than power).

Figure 3: Outcomes per At-Bat. Each point represents the total outcomes in the season divided by the total at-bats.

Figure 3: Outcomes per At-Bat. Each point represents the total outcomes in the season divided by the total at-bats.

Another (not exclusive) story is that players have become more disciplined and discriminate at the plate. If players only swing at pitches in their comfort zone, then each contact is more likely to be clean, and hence more likely to be a hit as well. However, hits per at-bat and hits per plate appearance are both down:

Figure 4: Hit Rates Over Time.

Figure 4: Hit Rates Over Time.

There are many other stories to tell. Maybe fielding has gotten worse, as teams select heavily on batting advantage over ability in the field. Maybe the rise in multi-purposes stadiums in the 60's and 70's, with artificial turf, larger foul territories, and shorter outfields made base hits more likely. Maybe the rise in home-run rates mechanically raises the hits-per-contact. Maybe players are less aggressive in their base running (suggested by decrease in triples).