Elf Playlist

Another week, another riddler. Here is this week’s problem:

In Santa’s workshop, elves make toys during a shift each day. On the overhead radio, Christmas music plays, with a program randomly selecting songs from a large playlist.

During any given shift, the elves hear 100 songs. A cranky elf named Cranky has taken to throwing snowballs at everyone if he hears the same song twice. This has happened during about half of the shifts. One day, a mathematically inclined elf named Mathy tires of Cranky’s sodden outbursts. So Mathy decides to use what he knows to figure out how large Santa’s playlist actually is.

Help Mathy out: How large is Santa’s playlist?

So to do this we used minimized N for the following

$P_{N} = \prod_{i=1}^{100}(1-\frac{i}{N})\approx0.5$

Each term represents the chance that the song is not a repeat — so the first song has a 0/N chance, so the chance that it is not that song is 1. This gets multiplied by the (1-1/N) for the second song, etc. etc.

This results in 7176 when our shift length is 100. Next, we vary the shift length and find the resulting playlist length:

This, interestingly, follows the following equation for a playlist length of l and playlist length of N:

$N = 0.7214l^2 - 1.8316l+2.6343$

Now we know why the shift length is 100, its likely the total number of Christmas songs 🙂

import matplotlib.pyplot as plt

def num_songs_in_playlist(goal_prob,shift_len_in_songs):
prob = 0
songs_in_playlist = 200
while prob < goal_prob:
prob = 1
for i in range(shift_len_in_songs): prob = (1-(i/songs_in_playlist)) * prob
songs_in_playlist += 1
return songs_in_playlist

x = [i for i in range(50,200)]
y = [num_songs_in_playlist(0.5,i) for i in range(50,200)]

plt.xlabel("Shift Length (songs)")
plt.ylabel("Playlist Length")
plt.scatter(x,y)
plt.title("Santa's playlist length to ensure repeating\n probability 0.5 over various shift lengths")


imagine a hockey game

Okay okay this is a fun problem. To do this we have to look at team B’s expected goals.

The chance that team B gets 0 points is

$P_{0} = (1-0.055)^{20} = 0.3226$

The chance team B gets 1 point is:

$P_{1} = 20*(0.055)*(1-0.055)^{19} = 0.3795$

and the chance team B gets 2+ points is:

$P_{2+} = 1 - P_{0} - P_{1} = 0.3019$

While the difference isn’t large, it holds that team A is expected to win. Counterintuitive!

Hue Lights 3.0

This is the third iteration of my Hue lights integration. An increase in technical knowledge and the fact that I am now living alone means that I could really go all out with this.

Hue lights are an RGB lightbulb that can change colors from various triggers. This project was designed to setup some unique or custom triggers.

Part I: Dash Buttons

These are small, wireless buttons that Amazon makes to make it easy for you to buy things. However I redirect these buttons so they turn all the lights on and off.

This uses Scapy to sniff the network and anytime something makes a request with the MAC address of a button I turn the lights on if most are off or vis versa.

Part II: Weather Forecast

Every morning, when my alarm goes off, my lights will turn on and this will look up the weather in my area using WeatherUnderground and set the lights to represent the weather.

It looks up the weather code against this excel spreadsheet that shows what color the lights will be

Part III: Scene Controller

This is a button pad that selects from eight scenes that have been preprogrammed for whatever activity seems “right” — so there is one for movies, yoga, reading, etc. Also its just fun to show off the system

This gets imported as a MIDI device using Pygame and then it polls for button presses. After each button press it looks up the code in the same spreadsheet as the weather codes

Thats it! Thanks for reading. The code can be found here

7/L Segment Display

Today I was troubleshooting a sensor that was giving an error code — but Error 52 “Over Voltage” was not what I expected… regardless I checked the voltage and it looked fine. About ten minutes later I realized that the display was mounted upside down and it was Error 25, “Programming Error” — duh.

Which one? Choose wisely

While this is totally my fault it is also avoidable. So if you are a company that makes things with important codes using a seven segment display AND your display can logically be upside down then don’t use any numbers that are also valid in either orientation.

Here are the amount of numbers that satisfy that criteria:

To find this I used the criteria of a number having to be not a number upside down (because of a 7, 4 or 3) or be itself when upside down (96 -> 96).

5 Digit Numbers

Here is the code:

import matplotlib.pyplot as plt

def flip(num):
flips = [
["0","0"],
["1","1"],
["2","2"],
["3","E_"],
["4","H_"],
["5","5"],
["6","9"],
["7","L_"],
["8","8"],
["9","6"]]

new_num = ""
for i in range(len(num)):
new_num += flips[int(num[len(num)-1-i])][1]
if new_num == num or len(new_num) &amp;gt; len(num): return num
else: return

def find_all_digits(digits):
valid_options = []
for i in range(0,10**digits):
val = str(i)
val = "0"*(digits-len(val))+val
if flip(val) is not None:
valid_options.append(flip(val))
return len(valid_options)

x,y = [],[]
for i in range(1,9):
x.append(i)
y.append(find_all_digits(i)/10**i)

fig = plt.figure()
plt.scatter(x,y)
for i,j in zip(x,y):
ax.annotate("("+str(i)[:4]+","+str(int(round(j*10**i)))+")",xy=(i-0.075*len(str(round(j*10**i))),j+0.01))
plt.title("Number of valid numbers that aren't confusing when upside down")
plt.xlabel("N\n Number of Digits Used")
plt.ylabel('Valid Entries / Total Possible Entries')
plt.show()



Hearthstone Meets Math

There is a pretty rad game called Hearthstone — its a digital card game made by Blizzard. Similar to Magic the Gathering two players face off using (usually) preconstructed decks of creatures and spells to defeat the opponent.

There is a mechanic in this game called discover, in which you are presented with three choices and get to select one.

For example, a card like Choose Your Path can be a “toolbox” card — if you draw it in the late game it can be a large creature or if you are getting swarmed by your opponent it can be an area-of-effect style spell. Getting this flexibility has costs, however, because it puts the selection into your hand (effectively taxing you one mana) and doesn’t guarantee you the spell you need. This flexibility makes it a powerful and skill-testing card.

Example selection choices from a “Discover” card

In the new set they revealed a new card: A New Challenger… and discussions cropped up about how good this card truly is

Since most discover cards have the “toolbox” aspect to them we can’t easily rank the cards offered. However with this card you just want the biggest minion every time — which should be able to determine pretty accurately how good this card is.

To do this we need to use math. So to restate the problem:

Given a deck of size N with cards numbered 1-N, what is the expected value of the largest card in a three card hand?

We get that the expected value of a hand of h cards from a pool of p cards is

$EV(\texttt{p,h})=\sum_{n=\texttt{p}}^{\texttt{h}}( C_{{(n)}(\texttt{p})}-C_{({n-1})(\texttt{p})})*n$

This represents taking the difference between the number of combinations of a deck of cards of n-1 and n cards. Notably, all the cards that are added in these combinations will have the n in them — otherwise they could be represented with n-1! These hands will have n as their largest card (its the largest it knows) and so the value of the hand must be n.

The probability of getting a score n from a pool of p cards with a hand of size h is P(p,h,n), given by:

$P(\texttt{p,h,}n)={(\dfrac{\texttt{h}}{n+1}})(1-\texttt{sum}(P(\texttt{p,h,}(n+1) \rightarrow \texttt{p})))$

In essence, this says that the probability is the (hand_size / pool_of_cards) after you remove the probability that its a higher value P(p,h,(n+1 -> p))

note that it shows the asymptote as being .755, it should be .75. This is the same output from the EV equation but divided by N to make it more comparable to one another

Okay so what does this tell us? The card will be, on average, 75% of the best card available.

Therefore, A New Challenger can be evaluated as getting you the 75th percentile of the ordered 6-Cost minions in the Hearthstone standard format with Taunt and Divine Shield.

Extensions:

What if Discover showed you more than 3 cards?

Logically, you would get a better card. Here is if you could see four cards:

And here is a similar figure but showing for a large range of hand sizes:

This gives you the equation:

$EV/N(\texttt{N})=(1-\dfrac{1}{\texttt{N}})$

Here is the source code


import itertools, math
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np

def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)

def EV(h,d):
#Deck of size d, hand of size h
delta = 0
points = 0
total_combs = 0
for i in range(h,d+1):
delta = nCr(i,h) - total_combs
total_combs += delta
points += delta*i
return points / total_combs / d

def EV_recursive(d,h):
#Deck of size d, hand of size h
ev = 0
p_card = [0 for i in range(d)]

for n in range(d-1,0,-1): p_card[n] = (h/(n+1))*(1-sum(p_card[n+1:]))
for c in range(d): ev += p_card[c]*(c+1)

return ev / d

end_val_1 = EV_recursive(10000,3)
end_val_2 = EV_recursive(10000,4)

rolls = 3
max_sided_die = 40

y_1 = [EV(rolls,i) for i in range(rolls,max_sided_die)]
y_2 = [EV(4,i) for i in range(4,max_sided_die)]
x_1 = [i for i in range(rolls,max_sided_die)]
x_2 = [i for i in range(4,max_sided_die)]

fig = plt.figure()

#Plotting 3 card hand
p = np.polyfit([0,1], [end_val_1,end_val_1], 1)
plt.plot(x_1,p[0]*np.log(x_1)+p[1],'--',color='blue')
ax.annotate("y = " + str(end_val_1)[:6],xy=(3,end_val_1+0.01))
plt.scatter(x_1,y_1, color="blue")
blue_patch = mpatches.Patch(color='blue', label='3 cards to select from')
orange_patch = mpatches.Patch(color='orange', label='4 cards to select from')

#Plotting 4 card hand
p = np.polyfit([0,1], [end_val_2,end_val_2], 1)
plt.plot(x_2,p[0]*np.log(x_2)+p[1],'--',color='orange')
ax.annotate("y = " + str(end_val_2)[:6],xy=(3,end_val_2+0.01))
plt.scatter(x_2,y_2, color = 'orange')

plt.legend(handles=[blue_patch,orange_patch])
plt.title('EV for selecting 3 or 4 cards from a deck of N cards\nnumbered 1-N and getting points equal to the largest one')
plt.xlabel("N\nnumber of cards")
plt.ylabel('EV / N\npercent of maximum score expected')
fig.show()



How much better is a 96mph fastball than a 95mph fastball?

While hiking a few months ago Jeff and I were talking about baseball and specifically, the fastball. To me, it feels like once you get into the upper 90’s the speeds seem to be idolized in a way that may not make sense. Does the actual velocity matter? Is it that much harder to bat against a ball going a mile per hour faster? In essence:

How much better is a 96mph fastball than a 95mph fastball?

I couldn’t find this anywhere so I took a stab at doing it myself. I looked at three stats:

• Contact% = Pitches on which contact was made / Swings
• Swing% = Swings / Pitches
• SwStr% = Swings and misses / Total pitches

These stats were chosen because the made sense to me and could be calculated from pitches individually, as opposed to needing the data from the entire at bat.

For the data, I used data from Clayton Kershaw’s pitches over his entire MLB career (2008-present). Why Clayton Kershaw? Because I don’t know baseball very well and someone told me he throws both fast and a lot. This gave me ~16,000 data points — here is the results from that data:

Contact%:

Contact%, defined by [Pitches on which contact was made / Swings] goes down, as most people predicted. For every mph you increase the velocity the contact percentage drops an average of 2%, which seems significant. In essence, its harder to make contact with faster pitches.

Swing%:

Swing%, defined by [Swings / Pitches] goes up, as everyone I asked predicted. For every mph you increase the velocity, batters swing about 3% more often. Looking at the range it shows you that the fastest of his pitches are swung at about 40% more than his slowest pitches, which is a huge gap. Batters swing more at faster pitches.

SwStr%:

SwStr%, defined by [Swings and misses / Total pitches] also goes up, as everyone I asked predicted. This is the most striking to me — batters swing and miss at balls on average twice as much when the fastball is 95mph compared to when its 92mph.

Conclusion:

Faster fastballs are better. The data we looked at clearly showed that all three of these stats get better for the pitcher as you increase the velocity. And the titular question can be answered now:

Notes:

The code can be found here. I used python 3.x with pandas and matplotlib to parse it.

Data points are taken by putting the slowest 200 pitches into a bucket. That bucket is then checked for the stat in question and the their velocities are averaged. The bucket size of 200 was chosen because it gave was the smallest number that showed the results without being unnecessarily noisy.

Obvious extensions would be doing this for other pitchers (does it extend beyond 96? Is every pitcher’s graph similar?) and other stats (Z/O-Swing, Z/O-Contact, basically everything here)

A Tale of Two Endpoints

Another week, another riddler. Here is this week’s problem:

You’ve just been hired to work in a juicy middle-management role at Riddler HQ — welcome aboard! We relocated you to a tastefully appointed apartment in Riddler City, five blocks west and 10 blocks south of the office. (The streets of Riddler City, of course, are laid out in a perfect grid.) You walk to work each morning and back home each evening. Restless and inquisitive mathematician that you are, you prefer to walk a different path along the streets each time. How long can you stay in that apartment before you are forced to walk the same path twice? (Assume you don’t take paths that are longer than required, and assume beaucoup bonus points for not using your computer.)

Extra credit: What if you instead took a bigger but more distant apartment, M blocks west and N blocks south of the office?

This problem is pretty easy if you are able to think about it from the right point of view.

What we want a list of directions, either S, south, or W, west. For example, one valid route is SSSSSSSSSSWWWWW (we can’t have E or N because they would make our route inefficient). Now, we want to reorder these to show every possible route. Using high school math, we know that permutations with repeated elements follows the form:

$\frac{N!}{A! \times B! \times C!}$

Where the set of $\text{\small N}$ letters has $\text{\small A}$ identical items, $\text{\small B}$ identical items, $\text{\small C}$ identical items, etc…

Using this we get our solution to be:

$\frac{15!}{10! \times 5!} = \text{\small 3003 trips}$

This can be confirmed with Python, which also gives us this heatmap of the traveler’s path

There ya go! You can work 1501 days, or about 4 years!

import matplotlib.pylab as plt
import seaborn as sns

south = 10
east = 5

distance = south + east
correct_route = []
heatmap = [[0 for e in range(max(south,east)+3)] for s in range(max(south,east)+3)]</pre>
for attempt in range(2**distance):
route = bin(attempt)[2:]

#if we go east exactly the right number of times then route is "correct". We do this by counting moves east
easterness = 0
for element in route: easterness += int(element)
if easterness == east:

#bin doesn't prepend 0's... we have to manually
while len(route) != distance: route = '0'+route

#Add route, then go through and add locations we step on to heatmap
correct_route.append(route)
location = [1,1+south]
for element in route:
heatmap[location[1]][location[0]] +=1
if int(element) == 1: location[0] +=1
else: location[1] -= 1
heatmap[location[1]][location[0]] +=1

#Format and plot
with sns.axes_style("white"):
ax = sns.heatmap(heatmap, vmax=3500, square=True,  cmap="YlGnBu", cbar = True)
ax.axis('off')
ax.legend().set_visible(False)

ax.set_title("Heatmap of Commute")

plt.show()