I recently saw a facebook post about a research done by Cambridge University. It claims that people can figure out the meaning of a scrambled word without much trouble as long as the first and last letter is in the right place. Here is the original post:
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in
waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht
the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl
mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the
huamn mnid deos not raed ervey lteter by istlef, but... the wrod as a
wlohe. Amzanig huh.
I'm really amazed by human brain's ability to reconstruct and figure out things based on little and sometimes pretty fuzzy information. It makes me think maybe there's too much redundancy built in to English language. The following YouTube video demonstrates the ability in extreme - a Wheel of Fortune contestant was able to solve the puzzle with only one letter.
You can actually see (or hear) that the host was in shock for a moment when she asked "Can I solve?". Truly amazing. In fact, later in an interview she said she already pretty much guessed what it was even before she asked for the letter "L". She just needed the letter "L" to confirm her guess.
I didn't bother to look up the original research. The scramble algorithm they used seems pretty obvious. It only takes me a little while to roll out my own version of the readable word scrambler as a simple web application. You can try it out at the end of this post. With the option "More readable", I swap two randomly selected letters in the middle once. With the option "Less readable", I do the swapping multiple times depending on the length of the word.
What could be the possible uses of this research other than revealing the amazing ability of human brain to process fuzzy information? My immediate thought is some kind of word-level lossy text compression. Since we don't need the exact order of letters in a word to know its meaning, we can afford to lose that information in exchange for smaller storage and faster transmission of information. I think it's quite a novel idea, as traditional text compression methods are almost always lossless, which means they do not tap into the powerful fuzzy processing ability of human brain. Theoretically, we can estimate the encoding space of our lossy compression compare to that of a lossless compression as follows:
- number of combinations of n distinct letters taken r at a time. The order of the r letters is not important.
- number of permutations of n distinct letters taken r at a time. The order of the r letters IS important.
So for example, to encode 2-letter sequence the theoretical encoding space of our lossy compression is 50% (or 1/2) of that of a lossless compression. To encode 3-letter sequence, the encoding space is only 17% (or 1/6) of that of a lossless compression. Note that since certain letter sequences are not possible to form any valid words, the actual encoding space is much smaller than theoretical encoding space.
While the practicality of a lossy text compression may be in doubt, you can surely piss off your facebook or twitter friends by posting some scrambled updates.
You can download the source code (in C#) of the Word Scrambler project below:
WordScrambler.zip (2.45 kb)