First word of Wordle

In the last week, I have started playing the online word game Wordle by Josh Wardle. I was lured in after getting curious about some strange Twitter status updates that showed rows of green, grey and yellow blocks. It turns out it’s a fun game, too.

The basic idea is to try to guess a five-letter word, and you get six guesses. Each day there is a new word, and everyone gets to guess the same one. After each guess (which must be an actual word), you get some information on how close the guess was because the letters in a guess are shown as green (correct letter in correct position), yellow (correct letter in incorrect position) or grey (incorrect letter). After you’ve finished guessing the word, you can share a status update that shows how well you went, in a way that doesn’t give away any information about the word. That’s what I was seeing on Twitter.

I’ve done it four times now, and a natural question is what word should be the first guess. At that point in time, there is no information about the daily word, so it makes sense to me that the first guess should be the same each day. However, what is the best word to use for that first guess?

The conclusion I’ve reached is that the best word should have five different letters, together which are the top five most likely letters to match in a word, i.e. maximise the chance of getting yellows. Additionally, those letters should ideally be in a position that is most likely to match the correct position, i.e. maximise the chance of getting greens.

To figure this out properly, I would need to know the word list being used by Wordle, which unfortunately I don’t. In fact, there may be two word lists: the word list used to allow guesses, and the word list used to pick the daily word. So, I’ll make a big assumption and use the Collins Scrabble Words from July 2019.

My tool of choice is going to be zsh on my MacBook Air. It doesn’t require anything sophisticated. Also, I’ve removed any extra headers from my word list, and run it through dos2unix to ensure proper end-of-line treatment.

First job is to extract just the 5 letter words:

% grep '^.....$' words.txt > words5.txt
%

Now we need to figure out how many words each letter of alphabet appears in:

% for letter in {A..Z}
for> do
for> echo $letter:`grep -c -i $letter words5.txt`
for> done | sort -t : -k 2 -n -r | head -n 10
S:5936
E:5705
A:5330
O:3911
R:3909
I:3589
L:3114
T:3033
N:2787
U:2436
%

That wasn’t very efficient, but it doesn’t need to be. We have our answer – the most popular letters are S, E, A, O and R. Putting these letters into a free, online anagram tool, it turns out that there are three words made up from these letters: AEROS, AROSE and SOARE.

Okay, so while only one of these is a word that you’d actually use, it turns out that Wordle accepts them all. It looks like Wordle might use the Scrabble word list for its guesses.

In any case, this looks like a pretty good set of letters, as the words in the word list are highly likely to have one of these letters:

% grep -c . words5.txt
12972
% grep -c -i -e A -e R -e O -e S -e E words5.txt
12395
%

Of the 12,972 words in the word list, 12,395 (96%) will have at least one letter match!

The next job is to figure out which of these three words is most likely to have letters in the same position as other words in the word list.

% grep -c -e A.... -e .E... -e ..R.. -e ...O. -e ....S words5.txt 
6578
% grep -c -e A.... -e .R... -e ..O.. -e ...S. -e ....E words5.txt
3742
% grep -c -e S.... -e .O... -e ..A.. -e ...R. -e ....E words5.txt
5726
%

We have a winner! A letter in AEROS is in the right position for 6,578 words (51%).

So, it looks like using AEROS as your first guess in Wordle is a pretty good choice. Just, don’t tell anyone that’s what you’re doing, or if you share the standard Wordle status update, it will actually contain spoilers.

2 thoughts on “First word of Wordle”

  1. Hi Andrew,
    As you’ve seen from me under different cover, we have a precise list of the 2,315 words used in the puzzle as solutions, allowing a precise ordering of the letter frequency distribution, which runs as follows:
    EAROT LISNC UYDHP MGBFK WVZ XQJ
    In terms of matches to the word list, owing to letter duplications (a five-letter word can have two or three letters the same) this re-orders slightly:
    EAROT LISNU CYHDP GMBFK WVX ZQJ
    There are several good combinations of letters, and surprisingly the one using the first five letters EAROT (oater, orate, roate) isn’t actually the best, although it makes 2,120 partial matches to the 2,315 words of the wordlist. The letter mix of EAROS (aeros, arose, soare) matches 2,132 words, and the mix of EARIS (aesir, arise, raise, serai) matches 2,147 words.
    As far as exact letter matches go, the word soare has exact matches with 1,166 words, or over half of the wordlist. This is because S is the most frequent first letter found in English words by a considerable margin; and E is very frequently a final letter.

    First letter frequencies:
    SCBTP AFGDM RLWEH VONIU QKJ YZ (no X words!)

    Last letter frequencies:
    EYTRL HNDKA OPMGS CFWIB XZU (no words ending in J, Q, or V)

    I haven’t looked at common digraphs except for adjacent double letters, which in order of most frequent to least are:
    LEOST FRNDP MBGZC IV (and no double letters for any others)

    At any rate, until someone does a much more exhaustive comparison of possible guessing words (a supplementary list of 10,657 words can be legitimately entered as guesses), soare would have strong claims as the most useful first word for Wordle.

    1. Given that you shared S is so infrequently a last letter in the Wordle answer word list, I’ve moved from AEROS as my first guess to SOARE as my standard first guess. However, I’m now thinking that I should have a standard first and second guess. It’s not in the spirit of “hard mode”, but there’s a trade-off between the strategies of finishing the puzzle as quickly as possible (minimise average number of guesses) and finishing the puzzle as reliably as possible (maximise chance of guessing answer in six attempts). Having now failed to guess the answer one day, I’d rather do the latter strategy than the former.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.