Four years ago I wrote part one of Panning for Pangrams. I looked through about a million tweets looking for a new pangram (think “The quick brown fox jumps over the lazy dog”). My goal was to find out if someone has accidentally tweeted a more efficient pangram — basically an applied infinite monkey theory.
If you didn’t read the initial post don’t worry, here are the rules:
- The tweet cannot include a username (@BuildABarr for example) or url (http://anything.com)
- The tweet gets 9 points for each unique alphabetical character. This incentives including more of the alphabet
- The tweet gets -1 point for each repeated character or non-whitespace character. This means numbers or symbols hurt the score. This incentives coherence
- I get veto power. I really only used this on people who tweeted the entire alphabet because I guess thats a thing
I then searched 9m tweets from this database: Below are some tweets that score well:
with 24 characters for 188 points
with 24 characters for 193 points
with 24 characters for 195 points
So there is lots of room for improvement!
<pre>import pandas as pd import matplotlib.pyplot as plt import os, time, re path = os.path.normpath("D:/test_set_tweets.txt") def run_data_logger(): max_points = 0 with open(path, "r", errors='ignore') as fp: #strips away non-utc chars for line in fp: line_data = line.split("\t") if len(line_data) > 2: text = line_data tweet = text if text.find("http:") is not -1 or text.find("@") is not -1: text = "" text = text.upper() #uppercase all to avoid doubling up text = re.sub('[ ]', "", text) #destroy spaces char_count = len(text) text = ''.join(set(text)) #remove repeated letters text = re.sub('[^A-Za-z]', "", text) #delete non-characters points = 9*len(text) -1*(char_count-len(text)) if points > 185 and len(text) >= 23: print("Points: ",points," chars: ", len(text)," Max tweet: ",tweet) run_data_logger()</pre>