🐵
Infinite Monkey Theorem
Week 14
From Wikipedia: "The infinite monkey theorem states that a monkey hitting keys independently and at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, including the complete works of William Shakespeare"
While we don't expect to find the entire works of Shakespeare in this text, we have had monkeys typing 10 million characters
Your job is to find all valid words from this dictionary that appear in the text, as well as outputting statistics about the word length frequencies of words that appear in the text
Note: this is a stricter word list than previous challenges - so make sure to download this one, rather than using a word list from another challenge
Below shows the expected output for the random input text "thisabcisdefmonkeyghibusiness"
A few things to note:
- "Total Count" refers to the overall number of instances of that word length that appeared - e.g. if we look at 1-letter words as an example, there was 1 "a" and 4 "i" occurrences, giving a total count of 5, but a unique word count of 2
- The unique words should be displayed in alphabetical order - i.e. "a" is displayed before "i" for one letter words
- The dictionary contains words much longer than 8 letters, so don't assume you only have to check up to 8 letters - for any word lengths which have no occurrences in the random text, we don't need to include that word length in the output - for example, we can see above, there were no 7 letter words, no 9 letter words etc
- Overlaps should be counted - for example, assume "aa" was a valid 2-letter word - the string "aaaaa" would hence contain 4 instances of the 2-letter substring "aa"
For this challenge, along with the dictionary link provided at the top of the page, you should use this random monkey string of 10 million characters as your program's input
Hints
Hints will be released at the start of each of the following days - e.g. the start of day 3 is 48 hours after the challenge starts
| Release Day | Hint |
|---|---|
| 2 | There are many ways you could attempt this challenge - you could use some kind of object/dictionary/hashmap/associative array (or whatever your programming language calls it) mapping a word length to total word counts and a set of unique words |
| 3 | Then just read the list of valid words one by one into a set - it will probably be useful to store the length of the longest word |
| 5 | You can start on the first character of the random string, check all substrings from length 1 to the maximum word length and see if any of these strings exist within the valid word set - if they do, add them to your list of words of that length, update the total count etc |
| 6 |