How much better is a 96mph fastball than a 95mph fastball?

While hiking a few months ago Jeff and I were talking about baseball and specifically, the fastball. To me, it feels like once you get into the upper 90’s the speeds seem to be idolized in a way that may not make sense. Does the actual velocity matter? Is it that much harder to bat against a ball going a mile per hour faster? In essence:

How much better is a 96mph fastball than a 95mph fastball?



I couldn’t find this anywhere so I took a stab at doing it myself. I looked at three stats:

  • Contact% = Pitches on which contact was made / Swings
  • Swing% = Swings / Pitches
  • SwStr% = Swings and misses / Total pitches

These stats were chosen because the made sense to me and could be calculated from pitches individually, as opposed to needing the data from the entire at bat.


For the data, I used data from Clayton Kershaw’s pitches over his entire MLB career (2008-present). Why Clayton Kershaw? Because I don’t know baseball very well and someone told me he throws both fast and a lot. This gave me ~16,000 data points — here is the results from that data:




Contact%, defined by [Pitches on which contact was made / Swings] goes down, as most people predicted. For every mph you increase the velocity the contact percentage drops an average of 2%, which seems significant. In essence, its harder to make contact with faster pitches.




Swing%, defined by [Swings / Pitches] goes up, as everyone I asked predicted. For every mph you increase the velocity, batters swing about 3% more often. Looking at the range it shows you that the fastest of his pitches are swung at about 40% more than his slowest pitches, which is a huge gap. Batters swing more at faster pitches.




SwStr%, defined by [Swings and misses / Total pitches] also goes up, as everyone I asked predicted. This is the most striking to me — batters swing and miss at balls on average twice as much when the fastball is 95mph compared to when its 92mph.



Faster fastballs are better. The data we looked at clearly showed that all three of these stats get better for the pitcher as you increase the velocity. And the titular question can be answered now:

A 96mph fastball is going to be made contact with 2% less often while being swung at 3% more often and swung on and missed about 1.5% more often.



The code can be found here. I used python 3.x with pandas and matplotlib to parse it.

Data points are taken by putting the slowest 200 pitches into a bucket. That bucket is then checked for the stat in question and the their velocities are averaged. The bucket size of 200 was chosen because it gave was the smallest number that showed the results without being unnecessarily noisy.

Obvious extensions would be doing this for other pitchers (does it extend beyond 96? Is every pitcher’s graph similar?) and other stats (Z/O-Swing, Z/O-Contact, basically everything here)

Continue reading


A Tale of Two Endpoints

Another week, another riddler. Here is this week’s problem:

You’ve just been hired to work in a juicy middle-management role at Riddler HQ — welcome aboard! We relocated you to a tastefully appointed apartment in Riddler City, five blocks west and 10 blocks south of the office. (The streets of Riddler City, of course, are laid out in a perfect grid.) You walk to work each morning and back home each evening. Restless and inquisitive mathematician that you are, you prefer to walk a different path along the streets each time. How long can you stay in that apartment before you are forced to walk the same path twice? (Assume you don’t take paths that are longer than required, and assume beaucoup bonus points for not using your computer.)

Extra credit: What if you instead took a bigger but more distant apartment, M blocks west and N blocks south of the office?

This problem is pretty easy if you are able to think about it from the right point of view.

What we want a list of directions, either S, south, or W, west. For example, one valid route is SSSSSSSSSSWWWWW (we can’t have E or N because they would make our route inefficient). Now, we want to reorder these to show every possible route. Using high school math, we know that permutations with repeated elements follows the form:

\frac{N!}{A! \times B! \times C!}


Where the set of \text{\small N} letters has \text{\small A} identical items, \text{\small B} identical items, \text{\small C} identical items, etc…

Using this we get our solution to be:

\frac{15!}{10! \times 5!} = \text{\small 3003 trips}


This can be confirmed with Python, which also gives us this heatmap of the traveler’s path


There ya go! You can work 1501 days, or about 4 years!

import matplotlib.pylab as plt
import seaborn as sns

south = 10
east = 5

distance = south + east
correct_route = []
heatmap = [[0 for e in range(max(south,east)+3)] for s in range(max(south,east)+3)]</pre>
for attempt in range(2**distance):
    route = bin(attempt)[2:]

    #if we go east exactly the right number of times then route is "correct". We do this by counting moves east
    easterness = 0
    for element in route: easterness += int(element)
    if easterness == east:

        #bin doesn't prepend 0's... we have to manually
        while len(route) != distance: route = '0'+route

        #Add route, then go through and add locations we step on to heatmap
        location = [1,1+south]
        for element in route:
            heatmap[location[1]][location[0]] +=1
            if int(element) == 1: location[0] +=1
            else: location[1] -= 1
        heatmap[location[1]][location[0]] +=1

#Format and plot
with sns.axes_style("white"):
    ax = sns.heatmap(heatmap, vmax=3500, square=True,  cmap="YlGnBu", cbar = True)

    ax.set_title("Heatmap of Commute")

the eccentric billionaire and the banker

Another week, another riddler. Here is this week’s problem:

An eccentric billionaire has a published a devilish math problem that she wants to see solved. Her challenge is to three-color a specific map that she likes — that is, to color its regions with only three colors while ensuring that no bordering regions are the same color. Being an eccentric billionaire, she offers $10 million to anyone who can present her with a solution.

You come up with a solution to this math problem! However, being a poor college student, you cannot come up with the $10,000 needed to travel to the billionaire’s remote island lair. You go to your local bank and ask the manager to lend you the $10,000. You explain to him that you will soon be winning $10 million, so you will easily be able to pay back the loan. But the manager is skeptical that you actually have a correct solution.

Of course, if you simply hand the manager your solution, there is nothing preventing him from throwing you out of his office and collecting the $10 million for himself. So, the question is: How do you prove to the manager that you have a solution to the problem without giving him the solution (or any part of the solution that makes it easy for him to reproduce it)?

Oh boy, okay so here is how I do it. First, we look at the map and number it:


Next, you and the banker come up with a contiguous path through the regions that goes through all the boarders that a region has (repeating is okay):


Now look at the order of regions this creates (starting/ending at the green arrow):

1, 2, 5, 3, 5, 2, 3, 4, 3, 6, 5, 1, 6, 1

Then the banker leaves. We rotate the order of the path however we like

original order:

1, 2, 5, 3, 5, 2, 3, 4, 3, 6, 5, 1, 6, 1

rotated one order:

2, 5, 3, 5, 2, 3, 4, 3, 6, 5, 1, 6, 1, 2

rotated two order:

5, 3, 5, 2, 3, 4, 3, 6, 5, 1, 6, 1, 2, 5

We then put tokens face down corresponding to each region’s color in the rotated order we picked. In our example solution:


If we used the “original order” the tokens would be:

(mind you they are face down)

R, Y, G, R, G, Y, R, Y, R, Y, G, R, Y, R”


The banker is then brought back in and allowed to flip over any two adjacent tokens. If the rules are satisfied then you will never flip two of the same color.

This process of rotating the path order, setting up the tokens and having the banker flip two can be repeated as many times as required.

Game Theory on a Number Line

Another week, another Riddler. Here is a fun problem:

Ariel, Beatrice and Cassandra — three brilliant game theorists — were bored at a game theory conference (shocking, we know) and devised the following game to pass the time. They drew a number line and placed $1 on the 1, $2 on the 2, $3 on the 3 and so on to $10 on the 10.

Each player has a personalized token. They take turns — Ariel first, Beatrice second and Cassandra third — placing their tokens on one of the money stacks (only one token is allowed per space). Once the tokens are all placed, each player gets to take every stack that her token is on or is closest to. If a stack is midway between two tokens, the players split that cash.

How will this game play out? How much is it worth to go first?

To solve this we have to assume each player is a perfect logician – then we work in reverse: say we know exactly where Ariel and Beatrice have placed their token. If this is the case we can find the optimal place for Cassandra to put her token to maximize her earnings. Using this we can back out the optimal place for Beatrice to put her token after each of Allice’s moves — it would be where Cassandra’s best move is the worst. Using this we can back out Allice’s best move — its where even if Beatrice and Cassandra use their best moves its the best for Ariel. Here’s an example:

#Of the form: 

# [[A's token place (0 indexed), B's token place, C's token place], [A's winnings, B's winnings, C's winnings]]

[[1, 5, 2], [1, 49, 5]]
[[1, 5, 4], [2.0, 47.0, 6.0]]
[[1, 5, 5], [3, 45, 7]]
[[1, 5, 6], [4.5, 10.5, 40]]
[[1, 5, 7], [4.5, 13.5, 37.0]]
[[1, 5, 8], [4.5, 16.5, 34]]
[[1, 5, 9], [4.5, 20.0, 30.5]]
[[1, 5, 10], [4.5, 23.5, 27]]

From this we can figure out if A places token on 1 and B places on 5 then C will always place on 6. Here is an example for B’s decision:

#Of the form:

# [[token places (1 indexed)], [winnings]]

[[1, 2, 3], [1, 2, 52]]
[[1, 3, 4], [2.0, 4.0, 49]]
[[1, 4, 5], [3, 7, 45]]
[[1, 5, 6], [4.5, 10.5, 40]]
[[1, 6, 7], [6, 15, 34]]
[[1, 7, 8], [8.0, 20.0, 27]]
[[1, 8, 7], [8.0, 27, 20.0]]
[[1, 9, 8], [10, 19, 26]]
[[1, 10, 9], [12.5, 10, 32.5]]

Here we determine that if A places on 1, then it is in B’s best interest to place on 8. We do this a final time and determine the players will play:

#Of the form:
# [[token places (1 indexed)], [winnings]]

[[5, 9, 8], [21, 19, 15]]

And this is the answer to our problem.

Matching Game

Another week, another riddler. I really liked this one, if a bit clear cut. The problem:

I have a matching game app for my 4-year-old daughter. There are 10 different pairs of cards, each pair depicting the same animal. That makes 20 cards total, all arrayed face down. The goal is to match all the pairs. When you flip two cards up, if they match, they stay up, decreasing the number of unmatched cards and rewarding you with the corresponding animal sound. If they don’t match, they both flip back down. (Essentially like Concentration.) However, my 1-year-old son also likes to play the game, exclusively for its animal sounds. He has no ability to match cards intentionally — it’s all random.

If he flips a pair of cards every second and it takes another second for them to either flip back over or to make the “matching” sound, how long should my daughter expect to have to wait before he finishes the game and it’s her turn again?

To solve this we can look at each “level” independently, where a level is the number of pairs of cards remaining. We start at level 10 and our goal is to get to level 0. To figure out the estimated amount of time on, for example, level 10 we use the equation:

t_{10} = \frac{1}{19}(2) + \frac{18}{19}(2+t_{10})

This says the first card selection doesn’t matter – but after selecting it we have to select the second card and there is only one pair. This equation shows that there is a 1/19 chance we correctly select the card and proceed, while costing two seconds. There is also a 18/19 chance we fail, which costs us two seconds plus the time to successfully complete the level. Solving this gives us:

t_{10} = 38

Extending this we can see that solving for a generalized level is:

t_{n} = \frac{1}{2n-1}(2) + \frac{2n-2}{2n-1}(2+t_{n})

Summing levels 1-10 gives us 200 seconds. This was confirmed by brute forcing it in python.

import random, math

def run_trial(card_pairs): 
  time = 0
card_set = [math.floor(i/2) for i in range(card_pairs*2)]
while len(card_set):
    selection_index = [i for i in range(len(card_set))]
    if card_set[selection_index[0]] == card_set[selection_index[1]]:
      value = card_set[selection_index[0]]
time += 2
return time

sum_time = 0
trials = 10000
for i in range(trials): sum_time += run_trial(10)
print("Average time: " + str(sum_time/trials))


A Classic Construction Problem

Another week, another Riddler. The question:

Consider four towns arranged to form the corners of a square, where each side is 10 miles long. You own a road-building company. The state has offered you $28 million to construct a road system linking all four towns in some way, and it costs you $1 million to build one mile of road. Can you turn a profit if you take the job?

Extra credit: How does your business calculus change if there were five towns arranged as a pentagon? Six as a hexagon? Etc.?

After a few napkin drawings I landed on this shape:


Then I set up this series of equations:


Plugging that into Python we get answer of 27.3205 miles of bridges with an x_1 distance of 2.887.


Pretty simple but fun!