🐵

Infinite Monkey Theorem

Week 14

Submit Answer See Hints See Solutions

From Wikipedia: "The infinite monkey theorem states that a monkey hitting keys independently and at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, including the complete works of William Shakespeare"

While we don't expect to find the entire works of Shakespeare in this text, we have had monkeys typing 10 million characters

Your job is to find all valid words from this dictionary that appear in the text, as well as outputting statistics about the word length frequencies of words that appear in the text

Note: this is a stricter word list than previous challenges - so make sure to download this one, rather than using a word list from another challenge

Below shows the expected output for the random input text "thisabcisdefmonkeyghibusiness"

--- 1-Letter Words --- Total Count: 5 Unique Count: 2 Unique Words: a i --- 2-Letter Words --- Total Count: 8 Unique Count: 6 Unique Words: hi in is ne on us --- 3-Letter Words --- Total Count: 4 Unique Count: 4 Unique Words: bus his key sin --- 4-Letter Words --- Total Count: 3 Unique Count: 3 Unique Words: monk sine this --- 5-Letter Words --- Total Count: 1 Unique Count: 1 Unique Words: sines --- 6-Letter Words --- Total Count: 1 Unique Count: 1 Unique Words: monkey --- 8-Letter Words --- Total Count: 1 Unique Count: 1 Unique Words: business

A few things to note:

"Total Count" refers to the overall number of instances of that word length that appeared - e.g. if we look at 1-letter words as an example, there was 1 "a" and 4 "i" occurrences, giving a total count of 5, but a unique word count of 2
The unique words should be displayed in alphabetical order - i.e. "a" is displayed before "i" for one letter words
The dictionary contains words much longer than 8 letters, so don't assume you only have to check up to 8 letters - for any word lengths which have no occurrences in the random text, we don't need to include that word length in the output - for example, we can see above, there were no 7 letter words, no 9 letter words etc
Overlaps should be counted - for example, assume "aa" was a valid 2-letter word - the string "aaaaa" would hence contain 4 instances of the 2-letter substring "aa"

For this challenge, along with the dictionary link provided at the top of the page, you should use this random monkey string of 10 million characters as your program's input

Hints

Hints will be released at the start of each of the following days - e.g. the start of day 3 is 48 hours after the challenge starts

Release Day	Hint
2	There are many ways you could attempt this challenge - you could use some kind of object/dictionary/hashmap/associative array (or whatever your programming language calls it) mapping a word length to total word counts and a set of unique words
3	Then just read the list of valid words one by one into a set - it will probably be useful to store the length of the longest word
5	You can start on the first character of the random string, check all substrings from length 1 to the maximum word length and see if any of these strings exist within the valid word set - if they do, add them to your list of words of that length, update the total count etc
6