First word of Wordle

In the last week, I have started playing the online word game Wordle by Josh Wardle. I was lured in after getting curious about some strange Twitter status updates that showed rows of green, grey and yellow blocks. It turns out it’s a fun game, too.

The basic idea is to try to guess a five-letter word, and you get six guesses. Each day there is a new word, and everyone gets to guess the same one. After each guess (which must be an actual word), you get some information on how close the guess was because the letters in a guess are shown as green (correct letter in correct position), yellow (correct letter in incorrect position) or grey (incorrect letter). After you’ve finished guessing the word, you can share a status update that shows how well you went, in a way that doesn’t give away any information about the word. That’s what I was seeing on Twitter.

I’ve done it four times now, and a natural question is what word should be the first guess. At that point in time, there is no information about the daily word, so it makes sense to me that the first guess should be the same each day. However, what is the best word to use for that first guess?

The conclusion I’ve reached is that the best word should have five different letters, together which are the top five most likely letters to match in a word, i.e. maximise the chance of getting yellows. Additionally, those letters should ideally be in a position that is most likely to match the correct position, i.e. maximise the chance of getting greens.

To figure this out properly, I would need to know the word list being used by Wordle, which unfortunately I don’t. In fact, there may be two word lists: the word list used to allow guesses, and the word list used to pick the daily word. So, I’ll make a big assumption and use the Collins Scrabble Words from July 2019.

My tool of choice is going to be zsh on my MacBook Air. It doesn’t require anything sophisticated. Also, I’ve removed any extra headers from my word list, and run it through dos2unix to ensure proper end-of-line treatment.

First job is to extract just the 5 letter words:

% grep '^.....$' words.txt > words5.txt

Now we need to figure out how many words each letter of alphabet appears in:

% for letter in {A..Z}
for> do
for> echo $letter:`grep -c -i $letter words5.txt`
for> done | sort -t : -k 2 -n -r | head -n 10

That wasn’t very efficient, but it doesn’t need to be. We have our answer – the most popular letters are S, E, A, O and R. Putting these letters into a free, online anagram tool, it turns out that there are three words made up from these letters: AEROS, AROSE and SOARE.

Okay, so while only one of these is a word that you’d actually use, it turns out that Wordle accepts them all. It looks like Wordle might use the Scrabble word list for its guesses.

In any case, this looks like a pretty good set of letters, as the words in the word list are highly likely to have one of these letters:

% grep -c . words5.txt
% grep -c -i -e A -e R -e O -e S -e E words5.txt

Of the 12,972 words in the word list, 12,395 (96%) will have at least one letter match!

The next job is to figure out which of these three words is most likely to have letters in the same position as other words in the word list.

% grep -c -e A.... -e .E... -e ..R.. -e ...O. -e ....S words5.txt 
% grep -c -e A.... -e .R... -e ..O.. -e ...S. -e ....E words5.txt
% grep -c -e S.... -e .O... -e ..A.. -e ...R. -e ....E words5.txt

We have a winner! A letter in AEROS is in the right position for 6,578 words (51%).

So, it looks like using AEROS as your first guess in Wordle is a pretty good choice. Just, don’t tell anyone that’s what you’re doing, or if you share the standard Wordle status update, it will actually contain spoilers.