Elf Playlist

Another week, another riddler. Here is this week’s problem:

In Santa’s workshop, elves make toys during a shift each day. On the overhead radio, Christmas music plays, with a program randomly selecting songs from a large playlist.

During any given shift, the elves hear 100 songs. A cranky elf named Cranky has taken to throwing snowballs at everyone if he hears the same song twice. This has happened during about half of the shifts. One day, a mathematically inclined elf named Mathy tires of Cranky’s sodden outbursts. So Mathy decides to use what he knows to figure out how large Santa’s playlist actually is.

Help Mathy out: How large is Santa’s playlist?

So to do this we used minimized N for the following

 

P_{N} = \prod_{i=1}^{100}(1-\frac{i}{N})\approx0.5

 

Each term represents the chance that the song is not a repeat — so the first song has a 0/N chance, so the chance that it is not that song is 1. This gets multiplied by the (1-1/N) for the second song, etc. etc.

This results in 7176 when our shift length is 100. Next, we vary the shift length and find the resulting playlist length:

Figure_1.png

This, interestingly, follows the following equation for a playlist length of l and playlist length of N:

N = 0.7214l^2 - 1.8316l+2.6343

 

Now we know why the shift length is 100, its likely the total number of Christmas songs 🙂

import matplotlib.pyplot as plt

def num_songs_in_playlist(goal_prob,shift_len_in_songs):
	prob = 0
	songs_in_playlist = 200
	while prob < goal_prob:
		prob = 1
		for i in range(shift_len_in_songs): prob = (1-(i/songs_in_playlist)) * prob
		songs_in_playlist += 1
	return songs_in_playlist

x = [i for i in range(50,200)]
y = [num_songs_in_playlist(0.5,i) for i in range(50,200)]

plt.xlabel("Shift Length (songs)")
plt.ylabel("Playlist Length")
plt.scatter(x,y)
plt.title("Santa's playlist length to ensure repeating\n probability 0.5 over various shift lengths")

 

Advertisements

imagine a hockey game

 

Okay okay this is a fun problem. To do this we have to look at team B’s expected goals.

The chance that team B gets 0 points is

 

P_{0} = (1-0.055)^{20} = 0.3226

 

The chance team B gets 1 point is:

 

P_{1} = 20*(0.055)*(1-0.055)^{19} = 0.3795

 

and the chance team B gets 2+ points is:

 

P_{2+} = 1 - P_{0} - P_{1} = 0.3019

 

While the difference isn’t large, it holds that team A is expected to win. Counterintuitive!

Hue Lights 3.0

This is the third iteration of my Hue lights integration. An increase in technical knowledge and the fact that I am now living alone means that I could really go all out with this.

Hue lights are an RGB lightbulb that can change colors from various triggers. This project was designed to setup some unique or custom triggers.

Untitled.png

 

 

Part I: Dash Buttons

These are small, wireless buttons that Amazon makes to make it easy for you to buy things. However I redirect these buttons so they turn all the lights on and off.

Untitled2.png

 

This uses Scapy to sniff the network and anytime something makes a request with the MAC address of a button I turn the lights on if most are off or vis versa.

 

Part II: Weather Forecast

Every morning, when my alarm goes off, my lights will turn on and this will look up the weather in my area using WeatherUnderground and set the lights to represent the weather.

Capture.JPG

It looks up the weather code against this excel spreadsheet that shows what color the lights will be

Part III: Scene Controller

This is a button pad that selects from eight scenes that have been preprogrammed for whatever activity seems “right” — so there is one for movies, yoga, reading, etc. Also its just fun to show off the system

Untitled3.png

This gets imported as a MIDI device using Pygame and then it polls for button presses. After each button press it looks up the code in the same spreadsheet as the weather codes

 

Capture2.JPG

 

Thats it! Thanks for reading. The code can be found here

7/L Segment Display

Today I was troubleshooting a sensor that was giving an error code — but Error 52 “Over Voltage” was not what I expected… regardless I checked the voltage and it looked fine. About ten minutes later I realized that the display was mounted upside down and it was Error 25, “Programming Error” — duh.

 

Untitled2.jpg

Which one? Choose wisely

 

While this is totally my fault it is also avoidable. So if you are a company that makes things with important codes using a seven segment display AND your display can logically be upside down then don’t use any numbers that are also valid in either orientation.

 

614BPv1jPaL._SY355_-314x252-1.jpg

 

Here are the amount of numbers that satisfy that criteria:

Figure_1.png

To find this I used the criteria of a number having to be not a number upside down (because of a 7, 4 or 3) or be itself when upside down (96 -> 96).

1 Digit Numbers
2 Digit Numbers
3 Digit Numbers
4 Digit Numbers
5 Digit Numbers

 

Here is the code:

import matplotlib.pyplot as plt

def flip(num):
    flips = [
        ["0","0"],
        ["1","1"],
        ["2","2"],
        ["3","E_"],
        ["4","H_"],
        ["5","5"],
        ["6","9"],
        ["7","L_"],
        ["8","8"],
        ["9","6"]]

    new_num = ""
    for i in range(len(num)):
        new_num += flips[int(num[len(num)-1-i])][1]
    if new_num == num or len(new_num) &amp;gt; len(num): return num
    else: return

def find_all_digits(digits):
    valid_options = []
    for i in range(0,10**digits):
        val = str(i)
        val = "0"*(digits-len(val))+val
        if flip(val) is not None:
            valid_options.append(flip(val))
    return len(valid_options)

x,y = [],[]
for i in range(1,9):
    x.append(i)
    y.append(find_all_digits(i)/10**i)

fig = plt.figure()
ax = fig.add_subplot(111)
plt.scatter(x,y)
for i,j in zip(x,y):
    ax.annotate("("+str(i)[:4]+","+str(int(round(j*10**i)))+")",xy=(i-0.075*len(str(round(j*10**i))),j+0.01))
plt.title("Number of valid numbers that aren't confusing when upside down")
plt.xlabel("N\n Number of Digits Used")
plt.ylabel('Valid Entries / Total Possible Entries')
plt.show()

 

Hearthstone Meets Math

There is a pretty rad game called Hearthstone — its a digital card game made by Blizzard. Similar to Magic the Gathering two players face off using (usually) preconstructed decks of creatures and spells to defeat the opponent.

There is a mechanic in this game called discover, in which you are presented with three choices and get to select one.

Choose_Your_Path(55598).png

For example, a card like Choose Your Path can be a “toolbox” card — if you draw it in the late game it can be a large creature or if you are getting swarmed by your opponent it can be an area-of-effect style spell. Getting this flexibility has costs, however, because it puts the selection into your hand (effectively taxing you one mana) and doesn’t guarantee you the spell you need. This flexibility makes it a powerful and skill-testing card.

Capture.PNG

Example selection choices from a “Discover” card

In the new set they revealed a new card: A New Challenger… and discussions cropped up about how good this card truly is

A_New_Challenger...(90173).png

Since most discover cards have the “toolbox” aspect to them we can’t easily rank the cards offered. However with this card you just want the biggest minion every time — which should be able to determine pretty accurately how good this card is.

To do this we need to use math. So to restate the problem:

Given a deck of size N with cards numbered 1-N, what is the expected value of the largest card in a three card hand?

 

We get that the expected value of a hand of h cards from a pool of p cards is

EV(\texttt{p,h})=\sum_{n=\texttt{p}}^{\texttt{h}}( C_{{(n)}(\texttt{p})}-C_{({n-1})(\texttt{p})})*n

 

This represents taking the difference between the number of combinations of a deck of cards of n-1 and n cards. Notably, all the cards that are added in these combinations will have the n in them — otherwise they could be represented with n-1! These hands will have n as their largest card (its the largest it knows) and so the value of the hand must be n.

The probability of getting a score n from a pool of p cards with a hand of size h is P(p,h,n), given by:

P(\texttt{p,h,}n)={(\dfrac{\texttt{h}}{n+1}})(1-\texttt{sum}(P(\texttt{p,h,}(n+1) \rightarrow \texttt{p})))

In essence, this says that the probability is the (hand_size / pool_of_cards) after you remove the probability that its a higher value P(p,h,(n+1 -> p))

figure_6.png

note that it shows the asymptote as being .755, it should be .75. This is the same output from the EV equation but divided by N to make it more comparable to one another

 

Okay so what does this tell us? The card will be, on average, 75% of the best card available.

Therefore, A New Challenger can be evaluated as getting you the 75th percentile of the ordered 6-Cost minions in the Hearthstone standard format with Taunt and Divine Shield.

 

Extensions:

 

What if Discover showed you more than 3 cards?

Logically, you would get a better card. Here is if you could see four cards:

figure_5.png

And here is a similar figure but showing for a large range of hand sizes:

Figure_7.png

This gives you the equation:

EV/N(\texttt{N})=(1-\dfrac{1}{\texttt{N}})

Here is the source code


import itertools, math
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np

def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)

def EV(h,d):
#Deck of size d, hand of size h
delta = 0
points = 0
total_combs = 0
for i in range(h,d+1):
delta = nCr(i,h) - total_combs
total_combs += delta
points += delta*i
return points / total_combs / d

def EV_recursive(d,h):
#Deck of size d, hand of size h
ev = 0
p_card = [0 for i in range(d)]

for n in range(d-1,0,-1): p_card[n] = (h/(n+1))*(1-sum(p_card[n+1:]))
for c in range(d): ev += p_card[c]*(c+1)

return ev / d

end_val_1 = EV_recursive(10000,3)
end_val_2 = EV_recursive(10000,4)

rolls = 3
max_sided_die = 40

y_1 = [EV(rolls,i) for i in range(rolls,max_sided_die)]
y_2 = [EV(4,i) for i in range(4,max_sided_die)]
x_1 = [i for i in range(rolls,max_sided_die)]
x_2 = [i for i in range(4,max_sided_die)]

fig = plt.figure()
ax = fig.add_subplot(111)

#Plotting 3 card hand
p = np.polyfit([0,1], [end_val_1,end_val_1], 1)
plt.plot(x_1,p[0]*np.log(x_1)+p[1],'--',color='blue')
ax.annotate("y = " + str(end_val_1)[:6],xy=(3,end_val_1+0.01))
plt.scatter(x_1,y_1, color="blue")
blue_patch = mpatches.Patch(color='blue', label='3 cards to select from')
orange_patch = mpatches.Patch(color='orange', label='4 cards to select from')

#Plotting 4 card hand
p = np.polyfit([0,1], [end_val_2,end_val_2], 1)
plt.plot(x_2,p[0]*np.log(x_2)+p[1],'--',color='orange')
ax.annotate("y = " + str(end_val_2)[:6],xy=(3,end_val_2+0.01))
plt.scatter(x_2,y_2, color = 'orange')

plt.legend(handles=[blue_patch,orange_patch])
plt.title('EV for selecting 3 or 4 cards from a deck of N cards\nnumbered 1-N and getting points equal to the largest one')
plt.xlabel("N\nnumber of cards")
plt.ylabel('EV / N\npercent of maximum score expected')
fig.show()

 

How much better is a 96mph fastball than a 95mph fastball?

While hiking a few months ago Jeff and I were talking about baseball and specifically, the fastball. To me, it feels like once you get into the upper 90’s the speeds seem to be idolized in a way that may not make sense. Does the actual velocity matter? Is it that much harder to bat against a ball going a mile per hour faster? In essence:

How much better is a 96mph fastball than a 95mph fastball?

 

Capture.JPG

I couldn’t find this anywhere so I took a stab at doing it myself. I looked at three stats:

  • Contact% = Pitches on which contact was made / Swings
  • Swing% = Swings / Pitches
  • SwStr% = Swings and misses / Total pitches

These stats were chosen because the made sense to me and could be calculated from pitches individually, as opposed to needing the data from the entire at bat.

 

For the data, I used data from Clayton Kershaw’s pitches over his entire MLB career (2008-present). Why Clayton Kershaw? Because I don’t know baseball very well and someone told me he throws both fast and a lot. This gave me ~16,000 data points — here is the results from that data:

 


Contact%:

contact_percent.png

Contact%, defined by [Pitches on which contact was made / Swings] goes down, as most people predicted. For every mph you increase the velocity the contact percentage drops an average of 2%, which seems significant. In essence, its harder to make contact with faster pitches.

 


Swing%:

swing_percentage.png

Swing%, defined by [Swings / Pitches] goes up, as everyone I asked predicted. For every mph you increase the velocity, batters swing about 3% more often. Looking at the range it shows you that the fastest of his pitches are swung at about 40% more than his slowest pitches, which is a huge gap. Batters swing more at faster pitches.

 


SwStr%:

swstr_percentage.png

SwStr%, defined by [Swings and misses / Total pitches] also goes up, as everyone I asked predicted. This is the most striking to me — batters swing and miss at balls on average twice as much when the fastball is 95mph compared to when its 92mph.

 


Conclusion:

Faster fastballs are better. The data we looked at clearly showed that all three of these stats get better for the pitcher as you increase the velocity. And the titular question can be answered now:

A 96mph fastball is going to be made contact with 2% less often while being swung at 3% more often and swung on and missed about 1.5% more often.

 


Notes:

The code can be found here. I used python 3.x with pandas and matplotlib to parse it.

Data points are taken by putting the slowest 200 pitches into a bucket. That bucket is then checked for the stat in question and the their velocities are averaged. The bucket size of 200 was chosen because it gave was the smallest number that showed the results without being unnecessarily noisy.

Obvious extensions would be doing this for other pitchers (does it extend beyond 96? Is every pitcher’s graph similar?) and other stats (Z/O-Swing, Z/O-Contact, basically everything here)

Continue reading

A Tale of Two Endpoints

Another week, another riddler. Here is this week’s problem:

You’ve just been hired to work in a juicy middle-management role at Riddler HQ — welcome aboard! We relocated you to a tastefully appointed apartment in Riddler City, five blocks west and 10 blocks south of the office. (The streets of Riddler City, of course, are laid out in a perfect grid.) You walk to work each morning and back home each evening. Restless and inquisitive mathematician that you are, you prefer to walk a different path along the streets each time. How long can you stay in that apartment before you are forced to walk the same path twice? (Assume you don’t take paths that are longer than required, and assume beaucoup bonus points for not using your computer.)

Extra credit: What if you instead took a bigger but more distant apartment, M blocks west and N blocks south of the office?

This problem is pretty easy if you are able to think about it from the right point of view.

What we want a list of directions, either S, south, or W, west. For example, one valid route is SSSSSSSSSSWWWWW (we can’t have E or N because they would make our route inefficient). Now, we want to reorder these to show every possible route. Using high school math, we know that permutations with repeated elements follows the form:

\frac{N!}{A! \times B! \times C!}

 

Where the set of \text{\small N} letters has \text{\small A} identical items, \text{\small B} identical items, \text{\small C} identical items, etc…

Using this we get our solution to be:

\frac{15!}{10! \times 5!} = \text{\small 3003 trips}

 

This can be confirmed with Python, which also gives us this heatmap of the traveler’s path

heatmap4.png

There ya go! You can work 1501 days, or about 4 years!

import matplotlib.pylab as plt
import seaborn as sns

south = 10
east = 5

distance = south + east
correct_route = []
heatmap = [[0 for e in range(max(south,east)+3)] for s in range(max(south,east)+3)]</pre>
for attempt in range(2**distance):
    route = bin(attempt)[2:]

    #if we go east exactly the right number of times then route is "correct". We do this by counting moves east
    easterness = 0
    for element in route: easterness += int(element)
    if easterness == east:

        #bin doesn't prepend 0's... we have to manually
        while len(route) != distance: route = '0'+route

        #Add route, then go through and add locations we step on to heatmap
        correct_route.append(route)
        location = [1,1+south]
        for element in route:
            heatmap[location[1]][location[0]] +=1
            if int(element) == 1: location[0] +=1
            else: location[1] -= 1
        heatmap[location[1]][location[0]] +=1

#Format and plot
with sns.axes_style("white"):
    ax = sns.heatmap(heatmap, vmax=3500, square=True,  cmap="YlGnBu", cbar = True)
    ax.axis('off')
    ax.legend().set_visible(False)

    ax.set_title("Heatmap of Commute")

    plt.show()